Re: spark-shell failing but pyspark works

2016-04-02 Thread Cyril Scetbon
Nobody has any idea ? > On Mar 31, 2016, at 23:22, Cyril Scetbon wrote: > > Hi, > > I'm having issues to create a StreamingContext with Scala using spark-shell. > It tries to access the localhost interface and the Application Master is not > running on this interface :

[jira] Vijay Parmar shared "PIG-4824: FOREACH throwing error" with you

2016-04-02 Thread Vijay Parmar (JIRA)
Vijay Parmar shared an issue with you --- Does anyone else also facing a similar issue with FOREACH on Pig? > FOREACH throwing error > -- > > Key: PIG-4824 > URL:

Re: Working out SQRT on a list

2016-04-02 Thread Mich Talebzadeh
Try this specifying sqrt as a Math function scala> val l = List(2,9,90,66) l: List[Int] = List(2, 9, 90, 66) scala> l.map(x => Math.sqrt(x*x)) res0: List[Double] = List(2.0, 9.0, 90.0, 66.0) HTH Dr Mich Talebzadeh LinkedIn *

RE: Spark vs Redshift

2016-04-02 Thread rajesh.prabhu
Hi Eris, I also found this rather old discussion, about Spark Vs Redshift. http://apache-spark-user-list.1001560.n3.nabble.com/Spark-v-Redshift-td18112.html Regards, Rajesh Basel, Switzerland Ph: +41 77 941 0562 rajesh.pra...@wipro.com From: Mich Talebzadeh

Re: Spark vs Redshift

2016-04-02 Thread Mich Talebzadeh
Hi, Like anything else your mileage varies using any tool. To start what is your use case here (fit for your needs)? You stated that you want to perform OLAP on large datasets. OLAP is normally performed on large data sets anyway so I assume you already have some form of Data Warehouse

Working out SQRT on a list

2016-04-02 Thread Ashok Kumar
Hi  I like a simple sqrt operation on a list but I don't get the result scala val l = List (1,5,786,25)l: List[Int] = List(1, 5, 786, 25) scala> l.map(x => x * x)res42: List[Int] = List(1, 25, 617796, 625) scala> l.map(x => x * x).sqrt:28: error: value sqrt is not a member of List[Int]           

Spark vs Redshift

2016-04-02 Thread Eris Lawrence
Hi Spark devs, I was recently into a tech session about data processing with spark vs redshift which concluded with metrics and datapoint that for 2 Billion data, Select queries on data based on filters on attributes were faster and cheaper on AWS Redshift as compared to an AWS Spark cluster. I

Re: --packages configuration equivalent item name?

2016-04-02 Thread Russell Jurney
Thanks, Andy! On Mon, Mar 28, 2016 at 8:44 AM, Andy Davidson < a...@santacruzintegration.com> wrote: > Hi Russell > > I use Jupyter python notebooks a lot. Here is how I start the server > > set -x # turn debugging on > > #set +x # turn debugging off > > > #

What is the most efficient way to do a sorted reduce in PySpark?

2016-04-02 Thread Russell Jurney
Dear Spark Users, I need assistance in understanding which way I should do a sorted reduce in PySpark. Yes, I know all reduces are sorted because sorting is grouping, but what I mean is that I need to create a tuple where the first field is a key, and the second field is a sorted list of all

Re: spark-shell with different username

2016-04-02 Thread Matt Tenenbaum
Hi Mich. I certainly should have included that info in my original message (sorry!): it's a mac, running OS X (10.11.3). Cheers -mt On Fri, Apr 1, 2016 at 11:16 PM, Mich Talebzadeh wrote: > Matt, > > What OS are you using on your laptop? Sounds like Ubuntu or

RE: Spark Metrics : Why is the Sink class declared private[spark] ?

2016-04-02 Thread Silvio Fiorito
In the meantime you can simply define your custom metric source in the org.apache.spark package. From: Walid Lezzar Sent: Saturday, April 2, 2016 4:23 AM To: Saisai Shao Cc: spark users Subject: Re: Spark

Re: Multiple lookups; consolidate result and run further aggregations

2016-04-02 Thread Ted Yu
Looking at the implementation for lookup in PairRDDFunctions, I think your understanding is correct. On Sat, Apr 2, 2016 at 3:16 AM, Nirav Patel wrote: > I will start by question: Is spark lookup function on pair rdd is a driver > action. ie result is returned to driver?

[no subject]

2016-04-02 Thread Hemalatha A
Hello, As per Spark programming guide, it says "we should have 2-4 partitions for each CPU in your cluster.". In this case how does 1 CPU core process 2-4 partitions at the same time? Link - http://spark.apache.org/docs/latest/programming-guide.html (under Rdd section) Does it do context

Re: Scala: Perform Unit Testing in spark

2016-04-02 Thread Ted Yu
I think you should specify dependencies in this way: *"org.apache.spark" % "spark-core_2.10" % "1.6.0"* % "tests" Please refer to http://www.scalatest.org/user_guide/using_scalatest_with_sbt On Fri, Apr 1, 2016 at 3:33 PM, Shishir Anshuman wrote: > When I added

Re: spark 1.5.2 - value filterByRange is not a member of org.apache.spark.rdd.RDD[(myKey, myData)]

2016-04-02 Thread Nirav Patel
In second class I re-declared following and compile error went away. Your soln worked too. implicit val rowKeyOrdering = rowKeyOrd Thanks Nirav On Wed, Mar 30, 2016 at 7:36 PM, Ted Yu wrote: > Have you tried the following construct ? > > new OrderedRDDFunctions[K, V,

Multiple lookups; consolidate result and run further aggregations

2016-04-02 Thread Nirav Patel
I will start by question: Is spark lookup function on pair rdd is a driver action. ie result is returned to driver? I have list of Keys on driver side and I want to perform multiple parallel lookups on pair rdd which returns Seq[V]; consolidate results; and perform further

Re: spark-shell with different username

2016-04-02 Thread Sebastian YEPES FERNANDEZ
Matt, have you tried using the parameter --*proxy*-*user* matt On Apr 2, 2016 8:17 AM, "Mich Talebzadeh" wrote: > Matt, > > What OS are you using on your laptop? Sounds like Ubuntu or something? > > Thanks > > Dr Mich Talebzadeh > > > > LinkedIn * >

Re: How to efficiently Scan (not filter nor lookup) part of Paird RDD or Ordered RDD

2016-04-02 Thread Nirav Patel
@IIya Ganellin, not sure how zipWithIndex() will do less then O(n) scan. Spark doc doesnt mention anything about it. I found solution with spark 1.5.2 OrderedRDDFunctions. It has filterByRange api. Thanks On Sun, Jan 24, 2016 at 10:27 PM, Sonal Goyal wrote: > One thing

Re: Spark Metrics : Why is the Sink class declared private[spark] ?

2016-04-02 Thread Walid Lezzar
This is great ! Hope this jira will be resolved for the next version of spark Thanks. > Le 2 avr. 2016 à 01:07, Saisai Shao a écrit : > > There's a JIRA (https://issues.apache.org/jira/browse/SPARK-14151) about it, > please take a look. > > Thanks > Saisai > >> On

Re: spark-shell with different username

2016-04-02 Thread Mich Talebzadeh
Matt, What OS are you using on your laptop? Sounds like Ubuntu or something? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Spark streaming rawSocketStream with protobuf

2016-04-02 Thread lokeshkumar
I am trying the spark streaming and listening to a socket, I am using the rawSocketStream method to create a receiver and a DStream. But when I print the DStream I get the below exception.*Code to create a DStream:*JavaSparkContext jsc = new JavaSparkContext("Master", "app");JavaStreamingContext