RE: Recommendations using Spark

2016-01-07 Thread Singh, Abhijeet
The question itself is very vague. You might want to use this slide as a starting point http://www.slideshare.net/CasertaConcepts/analytics-week-recommendations-on-spark. From: anjali gautam [mailto:anjali.gauta...@gmail.com] Sent: Friday, January 08, 2016 12:42 PM To: user@spark.apache.org

RE: Spark streaming driver java process RSS memory constantly increasing using cassandra driver

2015-12-14 Thread Singh, Abhijeet
this link it might give some starting points if you are a newbie. You might already know this otherwise. Thanks, Abhijeet -Original Message- From: Conor Fennell [mailto:conorapa...@gmail.com] Sent: Monday, December 14, 2015 8:29 PM To: Singh, Abhijeet Cc: user@spark.apache.org Subject: Re

RE: Spark streaming driver java process RSS memory constantly increasing using cassandra driver

2015-12-14 Thread Singh, Abhijeet
Hi Conor, What do you mean when you say leak is not in "Heap or non-Heap". If it is not heap related than it has to be the native memory that is leaking. I can't say for sure but you do have Threads working there and that could be using the native memory. We didn't get any pics of JConsole.

RE: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread Singh, Abhijeet
You need to maintain the offset yourself and rightly so in something like ZooKeeper. From: Tao Li [mailto:litao.bupt...@gmail.com] Sent: Tuesday, December 08, 2015 5:36 PM To: user@spark.apache.org Subject: Need to maintain the consumer offset by myself when using spark streaming kafka direct

RE: parquet file doubts

2015-12-07 Thread Singh, Abhijeet
Yes, Parquet has min/max. From: Cheng Lian [mailto:l...@databricks.com] Sent: Monday, December 07, 2015 11:21 AM To: Ted Yu Cc: Shushant Arora; user@spark.apache.org Subject: Re: parquet file doubts Oh sorry... At first I meant to cc spark-user list since Shushant and I had been discussed some

RE: Spark and Kafka Integration

2015-12-07 Thread Singh, Abhijeet
For Q2. The order of the logs in each partition is guaranteed but there cannot be any such thing as global order. From: Prashant Bhardwaj [mailto:prashant2006s...@gmail.com] Sent: Monday, December 07, 2015 5:46 PM To: user@spark.apache.org Subject: Spark and Kafka Integration Hi Some