Re: unit testing in spark

2017-04-05 Thread Shiva Ramagopal
Hi, I've been following this thread for a while. I'm trying to bring in a test strategy in my team to test a number of data pipelines before production. I have watched Lars' presentation and find it great. However I'm debating whether unit tests are worth the effort if there are good job-level

Re:RE: Fast write datastore...

2017-03-16 Thread Shiva Ramagopal
> Thanks, > > Muthu > > > > > > On Wed, Mar 15, 2017 at 10:55 AM, vvshvv <vvs...@gmail.com> wrote: > > Hi muthu, > > > > I agree with Shiva, Cassandra also supports SASI indexes, which can > partially replace Elasticsearch functionality. > >

Re: Fast write datastore...

2017-03-15 Thread Shiva Ramagopal
va, Cassandra also supports SASI indexes, which can >> partially replace Elasticsearch functionality. >> >> Regards, >> Uladzimir >> >> >> >> Sent from my Mi phone >> On Shiva Ramagopal <tr.s...@gmail.com>, Mar 15, 2017 5:57 PM wrote: >&

Re: Fast write datastore...

2017-03-15 Thread Shiva Ramagopal
Probably Cassandra is a good choice if you are mainly looking for a datastore that supports fast writes. You can ingest the data into a table and define one or more materialized views on top of it to support your queries. Since you mention that your queries are going to be simple you can define

Re: Spark / Elasticsearch Error: Maybe ES was overloaded? How to throttle down Spark as it writes to ES

2017-01-18 Thread Shiva Ramagopal
Probably using a queue like RabbitMQ between Spark and ES could help - to buffer the Spark output when ES can't keep up. Some links: 1. ES-RabbitMQ River - https://github.com/elastic/elasticsearch-river-rabbitmq/blob/master/README.md 2. Using RabbitMQ with ELK -

Re: Java Recipes for Spark

2016-07-29 Thread Shiva Ramagopal
+1 for the Java love :-) On 30-Jul-2016 4:39 AM, "Renato Perini" wrote: > Not only very useful, but finally some Java love :-) > > Thank you. > > > Il 29/07/2016 22:30, Jean Georges Perrin ha scritto: > >> Sorry if this looks like a shameless self promotion, but some of

Re: Unit testing framework for Spark Jobs?

2016-03-24 Thread Shiva Ramagopal
Hi Lars, Very pragmatic ideas around testing of Spark applications end-to-end! -Shiva On Fri, Mar 18, 2016 at 12:35 PM, Lars Albertsson wrote: > I would recommend against writing unit tests for Spark programs, and > instead focus on integration tests of jobs or pipelines of

Re: Spark Application Master on Yarn client mode - Virtual memory limit

2016-02-10 Thread Shiva Ramagopal
How are you submitting/running the job - via spark-submit or as a plain old Java program? If you are using spark-submit, you can control the memory setting via the configuration parameter spark.executor.memory in spark-defaults.conf. If you are running it as a Java program, use -Xmx to set the