Re: Spark Streaming to REST API

2017-12-21 Thread ashish rawat
ngu...@gmail.com> wrote: > hi Ashish, > > I was just wondering if there is any particular reason why you are posting > this to a SPARK group? > > Regards, > Gourav > > On Thu, Dec 21, 2017 at 8:32 PM, ashish rawat <dceash...@gmail.com> wrote: > >> Hi, &g

Spark Streaming to REST API

2017-12-21 Thread ashish rawat
Hi, We are working on a streaming solution where multiple out of order streams are flowing in the system and we need to join the streams based on a unique id. We are planning to use redis for this, where for every tuple, we will lookup if the id exists, we join if it does or else put the tuple

Re: NLTK with Spark Streaming

2017-12-01 Thread ashish rawat
r Hakobian, Ph.D. Staff Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Sun, Nov 26, 2017 at 8:19 AM, ashish rawat <dceash...@gmail.com> wrote: > Thanks Holden and Chetan. > > Holden - Have you tried it out, do you know the right way to do it? > Chetan - yes, if we

Re: NLTK with Spark Streaming

2017-11-26 Thread ashish rawat
at 3:31 PM, Holden Karau <hol...@pigscanfly.ca> > wrote: > >> So it’s certainly doable (it’s not super easy mind you), but until the >> arrow udf release goes out it will be rather slow. >> >> On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceash...@gmail.com&

NLTK with Spark Streaming

2017-11-25 Thread ashish rawat
Hi, Has someone tried running NLTK (python) with Spark Streaming (scala)? I was wondering if this is a good idea and what are the right Spark operators to do this? The reason we want to try this combination is that we don't want to run our transformations in python (pyspark), but after the

Re: Spark based Data Warehouse

2017-11-17 Thread ashish rawat
erent jobs or groups. Within a single group, using Livy to > create different spark contexts also works. > > - Affan > > On Tue, Nov 14, 2017 at 8:43 AM, ashish rawat <dceash...@gmail.com> wrote: > >> Thanks Sky Yin. This really helps. >> >> On Nov 14, 2017 12:

Re: Spark based Data Warehouse

2017-11-13 Thread ashish rawat
v 11, 2017 at 11:21 PM ashish rawat <dceash...@gmail.com> wrote: > Hello Everyone, > > I was trying to understand if anyone here has tried a data warehouse > solution using S3 and Spark SQL. Out of multiple possible options > (redshift, presto, hive etc), we were planning to go

Re: Spark based Data Warehouse

2017-11-13 Thread ashish rawat
dim.seme...@datadoghq.com> > *Date: *Sunday, November 12, 2017 at 1:06 PM > *To: *Gourav Sengupta <gourav.sengu...@gmail.com> > *Cc: *Phillip Henry <londonjava...@gmail.com>, ashish rawat < > dceash...@gmail.com>, Jörn Franke <jornfra...@gmail.com>, Deepak S

Re: Spark based Data Warehouse

2017-11-12 Thread ashish rawat
ut you may > need to train your data scientists. Some may know or prefer other tools. > > On 12. Nov 2017, at 08:32, Deepak Sharma <deepakmc...@gmail.com> wrote: > > I am looking for similar solution more aligned to data scientist group. > The concern i have is abou

Spark based Data Warehouse

2017-11-11 Thread ashish rawat
Hello Everyone, I was trying to understand if anyone here has tried a data warehouse solution using S3 and Spark SQL. Out of multiple possible options (redshift, presto, hive etc), we were planning to go with Spark SQL, for our aggregates and processing requirements. If anyone has tried it out,

Re: Spark for Log Analytics

2016-03-31 Thread ashish rawat
-spark-and-netflix-recommendations >> . >> >> btw, Confluent's distribution of Kafka does have a direct Http/REST API >> which is not recommended for production use, but has worked well for me in >> the past. >> >> these are some additional options to think about, an

Spark for Log Analytics

2016-03-31 Thread ashish rawat
Hi, I have been evaluating Spark for analysing Application and Server Logs. I believe there are some downsides of doing this: 1. No direct mechanism of collecting log, so need to introduce other tools like Flume into the pipeline. 2. Need to write lots of code for parsing different patterns from

Re: Save GraphX to disk

2015-11-20 Thread Ashish Rawat
Hi Todd, Could you please provide an example of doing this. Mazerunner seems to be doing something similar with Neo4j but it goes via hdfs and updates only the graph properties. Is there a direct way to do this with Neo4j or Titan? Regards, Ashish From: SLiZn Liu

Spark Application Hung

2015-03-24 Thread Ashish Rawat
Hi, We are observing a hung spark application when one of the yarn datanode (running multiple spark executors) go down. Setup details: * Spark: 1.2.1 * Hadoop: 2.4.0 * Spark Application Mode: yarn-client * 2 datanodes (DN1, DN2) * 6 spark executors (initially 3 executors on