Spark Streaming

2015-01-17 Thread Rohit Pujari
) } val unifiedStream = ssc.union(streams) val sparkProcessingParallelism = 1 unifiedStream.repartition(sparkProcessingParallelism) } //print(kafkaStream) ssc.start() ssc.awaitTermination() -- Rohit Pujari -- CONFIDENTIALITY NOTICE NOTICE: This message

Re: Spark Streaming

2015-01-17 Thread Rohit Pujari
: Saturday, January 17, 2015 at 4:10 AM To: Rohit Pujari rpuj...@hortonworks.commailto:rpuj...@hortonworks.com Subject: Re: Spark Streaming Streams are lazy. Their computation is triggered by an output operator, which is apparently missing from your code. See the programming guide: https

Re: Spark Streaming

2015-01-17 Thread Rohit Pujari
operation on the stream. On Sat, Jan 17, 2015 at 10:17 AM, Rohit Pujari rpuj...@hortonworks.com wrote: Hi Francois: I tried using print(kafkaStream)” as output operator but no luck. It throws the same error. Any other thoughts? Thanks, Rohit From: francois.garil...@typesafe.com

Re: Market Basket Analysis

2014-12-05 Thread Rohit Pujari
algos when they really mean they want to compute item similarity or make recommendations. What's your use case? On Thu, Dec 4, 2014 at 8:23 PM, Rohit Pujari rpuj...@hortonworks.com wrote: Sure, I’m looking to perform frequent item set analysis on POS data set. Apriori is a classic algorithm

Market Basket Analysis

2014-12-04 Thread Rohit Pujari
Hello Folks: I'd like to do market basket analysis using spark, what're my options? Thanks, Rohit Pujari Solutions Architect, Hortonworks -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information

Re: Market Basket Analysis

2014-12-04 Thread Rohit Pujari
to perform a similar task? If there's no spoon to spoon substitute, spoon to fork will suffice too. Hopefully this provides some clarification. Thanks, Rohit From: Tobias Pfeiffer t...@preferred.jpmailto:t...@preferred.jp Date: Thursday, December 4, 2014 at 7:20 PM To: Rohit Pujari rpuj

Python Scientific Libraries in Spark

2014-11-24 Thread Rohit Pujari
possible today and some of the active development in the community that's on the horizon. Thanks, Rohit Pujari Solutions Architect, Hortonworks -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information

Re: Spark job doesn't clean after itself

2014-10-12 Thread Rohit Pujari
Reviving this .. any thoughts experts? On Thu, Oct 9, 2014 at 3:47 PM, Rohit Pujari rpuj...@hortonworks.com wrote: Hello Folks: I'm running spark job on YARN. After the execution, I would expect the spark job to clean staging the area, but it seems every run creates a new staging directory

Spark job doesn't clean after itself

2014-10-09 Thread Rohit Pujari
Hello Folks: I'm running spark job on YARN. After the execution, I would expect the spark job to clean staging the area, but it seems every run creates a new staging directory. Is there a way to force spark job to clean after itself? Thanks, Rohit -- CONFIDENTIALITY NOTICE NOTICE: This message

Debug Spark in Cluster Mode

2014-10-09 Thread Rohit Pujari
Hello Folks: What're some best practices to debug Spark in cluster mode? Thanks, Rohit -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from

Re: Can Spark stack scale to petabyte scale without performance degradation?

2014-07-16 Thread Rohit Pujari
, 2014 at 9:17 AM, Rohit Pujari rpuj...@hortonworks.com wrote: Hello Folks: There is lot of buzz in the hadoop community around Spark's inability to scale beyond the 1 TB datasets ( or 10-20 nodes). It is being regarded as great tech for cpu intensive workloads on smaller data( less that TB

Can Spark stack scale to petabyte scale without performance degradation?

2014-07-15 Thread Rohit Pujari
boundaries of the tech and recommend right solution for right problem. Thanks, Rohit Pujari Solutions Engineer, Hortonworks rpuj...@hortonworks.com 716-430-6899 -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain

KMeansModel Construtor error

2014-07-14 Thread Rohit Pujari
Hello Folks: I have written a simple program to read the already saved model from HDFS and score it. But when I'm trying to read the saved model, I get the following error. Any clues what might be going wrong here .. val x = sc.objectFile[Vector](/data/model).collect() val y = new