Custom Session Windowing in Spark using Scala/Python

2023-08-03 Thread Ravi Teja
Hi, I am new to Spark and looking for help regarding the session windowing in Spark. I want to create session windows on a user activity stream with a gap duration of `x` minutes and also have

Re: Integration testing Framework Spark SQL Scala

2020-11-02 Thread Lars Albertsson
Hi, Sorry for the very slow reply - I am far behind in my mailing list subscriptions. You'll find a few slides covering the topic in this presentation: https://www.slideshare.net/lallea/test-strategies-for-data-processing-pipelines-67244458 Video here: https://vimeo.com/192429554 Regards,

elasticsearch-hadoop is not compatible with spark 3.0( scala 2.12)

2020-06-23 Thread murat migdisoglu
Hi, I'm testing our codebase against spark 3.0.0 stack and I realized that elasticsearch-hadoop libraries are built against scala 2.11 and thus are not working with spark 3.0.0. (and probably 2.4.2). Is there anybody else facing this issue? How did you solve it? The PR on the ES library is open

Re: Integration testing Framework Spark SQL Scala

2020-02-25 Thread Ruijing Li
Just wanted to follow up on this. If anyone has any advice, I’d be interested in learning more! On Thu, Feb 20, 2020 at 6:09 PM Ruijing Li wrote: > Hi all, > > I’m interested in hearing the community’s thoughts on best practices to do > integration testing for spark sql jobs. We run a lot of

Integration testing Framework Spark SQL Scala

2020-02-20 Thread Ruijing Li
Hi all, I’m interested in hearing the community’s thoughts on best practices to do integration testing for spark sql jobs. We run a lot of our jobs with cloud infrastructure and hdfs - this makes debugging a challenge for us, especially with problems that don’t occur from just initializing a

Spark 2.4 scala 2.12 Regular Expressions Approach

2019-07-15 Thread anbutech
Hi All, Could you please help me to fix the below issue using spark 2.4 , scala 2.12 How do we extract's the multiple values in the given file name pattern using spark/scala regular expression.please give me some idea on the below approach. object Driver { private val filePattern

Re: Best notebook for developing for apache spark using scala on Amazon EMR Cluster

2019-05-01 Thread Jeff Zhang
You can configure zeppelin to store your notes in S3 http://zeppelin.apache.org/docs/0.8.1/setup/storage/storage.html#notebook-storage-in-s3 V0lleyBallJunki3 于2019年5月1日周三 上午5:26写道: > Hello. I am using Zeppelin on Amazon EMR cluster while developing Apache > Spark programs in

Best notebook for developing for apache spark using scala on Amazon EMR Cluster

2019-04-30 Thread V0lleyBallJunki3
Hello. I am using Zeppelin on Amazon EMR cluster while developing Apache Spark programs in Scala. The problem is that once that cluster is destroyed I lose all the notebooks on it. So over a period of time I have a lot of notebooks that require to be manually exported into my local disk and from

Spark with Scala : understanding closures or best way to take udf registrations' code out of main and put in utils

2018-08-21 Thread aastha
This is more of a Scala concept doubt than Spark. I have this Spark initialization code : object EntryPoint { val spark = SparkFactory.createSparkSession(... val funcsSingleton = ContextSingleton[CustomFunctions] { new CustomFunctions(Some(hashConf)) } lazy val funcs =

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-16 Thread Gourav Sengupta
ith .pipe(.py) as Prem suggested. >>>> >>>> That passes the RDD as CSV strings to the python script. The python >>>> script can either process it line by line, create the result and return it >>>> back. Or create things like Pandas Dataframe for processi

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-15 Thread Chetan Khatri
Jayant Shekhar >> wrote: >> >>> Hello Chetan, >>> >>> We have currently done it with .pipe(.py) as Prem suggested. >>> >>> That passes the RDD as CSV strings to the python script. The python >>> script can either process it line by line, c

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-12 Thread Jayant Shekhar
either process it line by line, create the result and return it >> back. Or create things like Pandas Dataframe for processing and finally >> write the results back. >> >> In the Spark/Scala/Java code, you get an RDD of string, which we convert >> back to a Dataframe.

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-09 Thread Chetan Khatri
like Pandas Dataframe for processing and finally write the > results back. > > In the Spark/Scala/Java code, you get an RDD of string, which we convert > back to a Dataframe. > > Feel free to ping me directly in case of questions. > > Thanks, > Jayant > > > On Thu, Jul 5

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-05 Thread Jayant Shekhar
the results back. In the Spark/Scala/Java code, you get an RDD of string, which we convert back to a Dataframe. Feel free to ping me directly in case of questions. Thanks, Jayant On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri wrote: > Prem sure, Thanks for suggestion. > > On Wed, Ju

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-05 Thread Chetan Khatri
taset API. >>> >>> Can someone please guide me, which would be best approach to do this. >>> Python function would be mostly transformation function. Also would like to >>> pass Java Function as a String to Spark / Scala job and it applies to RDD / >>> Data Frame and should return RDD / Data Frame. >>> >>> Thank you. >>> >>> >>> >>> >

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-04 Thread Prem Sure
ser defined function to Spark Job developed >> using Scala and return value of that function would be returned to DF / >> Dataset API. >> >> Can someone please guide me, which would be best approach to do this. >> Python function would be mostly transformation function

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-04 Thread Chetan Khatri
et API. > > Can someone please guide me, which would be best approach to do this. > Python function would be mostly transformation function. Also would like to > pass Java Function as a String to Spark / Scala job and it applies to RDD / > Data Frame and should return RDD / Data Frame. > > Thank you. > > > >

Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-03 Thread Chetan Khatri
transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame. Thank you.

Re: Spark with Scala 2.12

2018-04-21 Thread Mark Hamstra
Even more to the point: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-2-12-support-td23833.html tldr; It's an item of discussion, but there is no imminent release of Spark that will use Scala 2.12. On Sat, Apr 21, 2018 at 2:44 AM, purijatin wrote: > I see

Re: Spark with Scala 2.12

2018-04-21 Thread purijatin
I see a discussion post on the dev mailing list: http://apache-spark-developers-list.1001551.n3.nabble.com/time-for-Apache-Spark-3-0-td23755.html#a23830 Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Spark with Scala 2.12

2018-04-20 Thread Jatin Puri
Hello. I am wondering, if there is any new update on Spark upgrade to Scala 2.12. https://issues.apache.org/jira/browse/SPARK-14220. Especially given that Scala 2.13 is near the vicinity of a release. This is because, there is no recent update on the Jira and related ticket. May be someone

CI/CD for spark and scala

2018-01-24 Thread Deepak Sharma
Hi All, I just wanted to check if there are any best practises around using CI/CD for spark / scala projects running on AWS hadoop clusters. IF there is any specific tools , please do let me know. -- Thanks Deepak

Re: XML Parsing with Spark and SCala

2017-08-11 Thread Jörn Franke
oad and the desired output is also not coming. > I am attaching a file. Can anyone help me to do this > > solvePuzzle1.scala > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n29053/solvePuzzle1.scala> > > > > > -- > View this message in context: > h

XML Parsing with Spark and SCala

2017-08-11 Thread Etisha Jain
Hi I want to do xml parsing with spark, but the data from the file is not able to load and the desired output is also not coming. I am attaching a file. Can anyone help me to do this solvePuzzle1.scala <http://apache-spark-user-list.1001560.n3.nabble.com/file/n29053/solvePuzzle1.sc

command to get list oin spark 2.0 scala of all persisted rdd's in spark 2.0 scala shell

2017-06-01 Thread nancy henry
Hi Team, Please let me know how to get list of all persisted RDD's ins park 2.0 shell Regards, Nancy

Re: Spark 2.0 Scala 2.11 and Kafka 0.10 Scala 2.10

2017-02-08 Thread Cody Koeninger
t;u...@moosheimer.com> wrote: > Dear devs, > > is it possible to use Spark 2.0.2 Scala 2.11 and consume messages from kafka > server 0.10.0.2 running on Scala 2.10? > I tried this the last two days by using createDirectStream and can't get no > message out of kafka?! > > I'm usin

Spark 2.0 Scala 2.11 and Kafka 0.10 Scala 2.10

2017-02-08 Thread u...@moosheimer.com
Dear devs, is it possible to use Spark 2.0.2 Scala 2.11 and consume messages from kafka server 0.10.0.2 running on Scala 2.10? I tried this the last two days by using createDirectStream and can't get no message out of kafka?! I'm using HDP 2.5.3 running kafka_2.10-0.10.0.2.5.3.0-37 and Spark

Re: Assembly for Kafka >= 0.10.0, Spark 2.2.0, Scala 2.11

2017-01-18 Thread Cody Koeninger
, Karamba <phantom...@web.de> wrote: > |Hi, I am looking for an assembly for Spark 2.2.0 with Scala 2.11. I > can't find one in MVN Repository. Moreover, "org.apache.spark" %% > "spark-streaming-kafka-0-10_2.11" % "2.1.0 shows that even sbt does not > find o

Assembly for Kafka >= 0.10.0, Spark 2.2.0, Scala 2.11

2017-01-18 Thread Karamba
|Hi, I am looking for an assembly for Spark 2.2.0 with Scala 2.11. I can't find one in MVN Repository. Moreover, "org.apache.spark" %% "spark-streaming-kafka-0-10_2.11" % "2.1.0 shows that even sbt does not find one: [error] (*:update) sbt.ResolveException: unresolved dep

Re: [Spark SQL - Scala] TestHive not working in Spark 2

2017-01-13 Thread Xin Wu
) >> >> >> >> *From:* Xin Wu [mailto:xwu0...@gmail.com] >> *Sent:* 13 janvier 2017 12:43 >> *To:* Nicolas Tallineau <nicolas.tallin...@ubisoft.com> >> *Cc:* user@spark.apache.org >> *Subject:* Re: [Spark SQL - Scala] TestHive not working in Spark

Re: [Spark SQL - Scala] TestHive not working in Spark 2

2017-01-13 Thread Xin Wu
ier 2017 12:43 > *To:* Nicolas Tallineau <nicolas.tallin...@ubisoft.com> > *Cc:* user@spark.apache.org > *Subject:* Re: [Spark SQL - Scala] TestHive not working in Spark 2 > > > > I used the following: > > > val testHive = new org.apache.spark.sql.hive.test.Tes

RE: [Spark SQL - Scala] TestHive not working in Spark 2

2017-01-13 Thread Nicolas Tallineau
nvier 2017 12:43 To: Nicolas Tallineau <nicolas.tallin...@ubisoft.com> Cc: user@spark.apache.org Subject: Re: [Spark SQL - Scala] TestHive not working in Spark 2 I used the following: val testHive = new org.apache.spark.sql.hive.test.TestHiveContext(sc, false

Re: [Spark SQL - Scala] TestHive not working in Spark 2

2017-01-13 Thread Xin Wu
I used the following: val testHive = new org.apache.spark.sql.hive.test.TestHiveContext(sc, *false*) val hiveClient = testHive.sessionState.metadataHive hiveClient.runSqlHive(“….”) On Fri, Jan 13, 2017 at 6:40 AM, Nicolas Tallineau < nicolas.tallin...@ubisoft.com> wrote: > I get a

[Spark SQL - Scala] TestHive not working in Spark 2

2017-01-13 Thread Nicolas Tallineau
I get a nullPointerException as soon as I try to execute a TestHive.sql(...) statement since migrating to Spark 2 because it's trying to load non existing "test tables". I couldn't find a way to switch to false the loadTestTables variable. Caused by: sbt.ForkMain$ForkError:

Pasting oddity with Spark 2.0 (scala)

2016-11-14 Thread jggg777
Frame, StructType, etc (for example org.apache.spark.sql.DataFrame) does work, but it's a painful workaround and we don't know why the imports don't seem to be working as usual. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pasting-oddity-with-Spark-2-0-scala-tp28071.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Mich Talebzadeh
edantic, non intended). >>>>> >>>>> [image: Inline images 1] >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >&

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Alonso Isidoro Roman
in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> On 5 September 2016 at 15:08, Alonso Isidoro Roman <alons...@gmail.com> >> wrote: >> >>> Hi Mitch, i wrote few months ago a tiny pr

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Alonso Isidoro Roman
ir/awesome-recommendation-engine> >> >> >> >> Alonso Isidoro Roman >> [image: https://]about.me/alonso.isidoro.roman >> >> <https://about.me/alonso.isidoro.roman?promo=email_sig_source=email_sig_medium=email_sig_campaign=external_links> >> >

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Mich Talebzadeh
sidoro Roman > [image: https://]about.me/alonso.isidoro.roman > > <https://about.me/alonso.isidoro.roman?promo=email_sig_source=email_sig_medium=email_sig_campaign=external_links> > > 2016-09-05 15:41 GMT+02:00 Mich Talebzadeh <mich.talebza...@gmail.com>: > >> Hi, &g

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Alonso Isidoro Roman
https://]about.me/alonso.isidoro.roman <https://about.me/alonso.isidoro.roman?promo=email_sig_source=email_sig_medium=email_sig_campaign=external_links> 2016-09-05 15:41 GMT+02:00 Mich Talebzadeh <mich.talebza...@gmail.com>: > Hi, > > Has anyone done any work on Real time recomm

Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Mich Talebzadeh
Hi, Has anyone done any work on Real time recommendation engines with Spark and Scala. I have seen few PPTs with Python but wanted to see if these have been done with Scala. I trust this question makes sense. Thanks p.s. My prime interest would be in Financial markets. Dr Mich Talebzadeh

Re: How to read *.jhist file in Spark using scala

2016-05-24 Thread Miles
in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-jhist-file-in-Spark-using-scala-tp26972p27015.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user

spark w/ scala 2.11 and PackratParsers

2016-05-04 Thread matd
face at some point ? Mathieu -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-w-scala-2-11-and-PackratParsers-tp26877.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

[Ask :]Best Practices - Application logging in Spark 1.5.2 + Scala 2.10

2016-04-21 Thread Divya Gehlot
Hi, I am using Spark with Hadoop 2.7 cluster I need to print all my print statement and or any errors to file for instance some info if passed some level or some error if something misisng in my Spark Scala Script. Can some body help me or redirect me tutorial,blog, books . Whats the best way

RE: pass one dataframe column value to another dataframe filter expression + Spark 1.5 + scala

2016-02-05 Thread Lohith Samaga M
...@gmail.com] Sent: Friday, February 05, 2016 13.12 To: user @spark Subject: pass one dataframe column value to another dataframe filter expression + Spark 1.5 + scala Hi, I have two input datasets First input dataset like as below : year,make,model,comment,blank "2012",

Re: pass one dataframe column value to another dataframe filter expression + Spark 1.5 + scala

2016-02-05 Thread Ali Tajeldin
/ Mit freundlichen Grüßen / Sincères salutations > M. Lohith Samaga > > From: Divya Gehlot [mailto:divya.htco...@gmail.com] > Sent: Friday, February 05, 2016 13.12 > To: user @spark > Subject: pass one dataframe column value to another dataframe filter > expression + Spark

pass one dataframe column value to another dataframe filter expression + Spark 1.5 + scala

2016-02-04 Thread Divya Gehlot
te(s) TagId#5 > missing from comment#3,blank#4,model#2,make#1,year#0 in operator !Project > [year#0,make#1,model#2,comment#3,blank#4,TagId#5 AS TagId#8]; > > finaldf.write.format("com.databricks.spark.csv").option("header", > "true").save("/TestDivya/Spark/carswithtags.csv") Would really appreciate if somebody give me pointers how can I pass the filter condition(second dataframe) to filter function of first dataframe. Or another solution . My apppologies for such a naive question as I am new to scala and Spark Thanks

Optimized way to multiply two large matrices and save output using Spark and Scala

2016-01-13 Thread Devi P.V
I want to multiply two large matrices (from csv files)using Spark and Scala and save output.I use the following code val rows=file1.coalesce(1,false).map(x=>{ val line=x.split(delimiter).map(_.toDouble) Vectors.sparse(line.length, line.zipWithIndex.map(e => (e._2

Re: Optimized way to multiply two large matrices and save output using Spark and Scala

2016-01-13 Thread Burak Yavuz
to multiply, and CoordinateMatrix to save it back again. Thanks, Burak On Wed, Jan 13, 2016 at 8:16 PM, Devi P.V <devip2...@gmail.com> wrote: > I want to multiply two large matrices (from csv files)using Spark and > Scala and save output.I use the following code > > val rows=file

Re: Pivot Data in Spark and Scala

2015-10-31 Thread ayan guha
eCombiners) > .map{ *case *(name, mapOfYearsToValues) => (*Seq*(name) ++ > sequenceOfYears.map(year => mapOfYearsToValues.getOrElse(year, *" "* > ))).mkString(*","*)}* // here we assume that sequence of all years isn’t > too big to not fit in memory. If

Re: Pivot Data in Spark and Scala

2015-10-30 Thread Adrian Tanase
-with-spark-ml-pipelines.html -adrian From: Deng Ching-Mallete Date: Friday, October 30, 2015 at 4:35 AM To: Ascot Moss Cc: User Subject: Re: Pivot Data in Spark and Scala Hi, You could transform it into a pair RDD then use the combineByKey function. HTH, Deng On Thu, Oct 29, 2015 at 7:29 PM, Ascot

RE: Pivot Data in Spark and Scala

2015-10-30 Thread Andrianasolo Fanilo
d you would definitely need to use a specialized timeseries library… result.foreach(println) sc.stop() Best regards, Fanilo De : Adrian Tanase [mailto:atan...@adobe.com] Envoyé : vendredi 30 octobre 2015 11:50 À : Deng Ching-Mallete; Ascot Moss Cc : User Objet : Re: Pivot Data in Spark and S

Re: Pivot Data in Spark and Scala

2015-10-30 Thread Ali Tajeldin EDU
emory. If you had to compute for each day, it may break > and you would definitely need to use a specialized timeseries library… > > result.foreach(println) > > sc.stop() > > Best regards, > Fanilo > > De : Adrian Tanase [mailto:atan...@adobe.com] > Envoyé : v

Re: Pivot Data in Spark and Scala

2015-10-30 Thread Ruslan Dautkhanov
B, 2013 4 > > > I need to convert the data to a new format: > A ,4,12,1 > B, 24, ,4 > > Any idea how to make it in Spark Scala? > > Thanks > >

Pivot Data in Spark and Scala

2015-10-29 Thread Ascot Moss
Hi, I have data as follows: A, 2015, 4 A, 2014, 12 A, 2013, 1 B, 2015, 24 B, 2013 4 I need to convert the data to a new format: A ,4,12,1 B, 24,,4 Any idea how to make it in Spark Scala? Thanks

Re: Pivot Data in Spark and Scala

2015-10-29 Thread Deng Ching-Mallete
B, 2013 4 > > > I need to convert the data to a new format: > A ,4,12,1 > B, 24, ,4 > > Any idea how to make it in Spark Scala? > > Thanks > >

Re: spark and scala-2.11

2015-08-24 Thread Lanny Ripple
, The instructions for building spark against scala-2.11 indicate using -Dspark-2.11. When I look in the pom.xml I find a profile named 'spark-2.11' but nothing that would indicate I should set a property. The sbt build seems to need the -Dscala-2.11 property set. Finally build/mvn does a simple

Re: spark and scala-2.11

2015-08-24 Thread Jonathan Coveney
I've used the instructions and it worked fine. Can you post exactly what you're doing, and what it fails with? Or are you just trying to understand how it works? 2015-08-24 15:48 GMT-04:00 Lanny Ripple la...@spotright.com: Hello, The instructions for building spark against scala-2.11

Re: spark and scala-2.11

2015-08-24 Thread Sean Owen
...@spotright.com wrote: Hello, The instructions for building spark against scala-2.11 indicate using -Dspark-2.11. When I look in the pom.xml I find a profile named 'spark-2.11' but nothing that would indicate I should set a property. The sbt build seems to need the -Dscala-2.11 property set. Finally

spark and scala-2.11

2015-08-24 Thread Lanny Ripple
Hello, The instructions for building spark against scala-2.11 indicate using -Dspark-2.11. When I look in the pom.xml I find a profile named 'spark-2.11' but nothing that would indicate I should set a property. The sbt build seems to need the -Dscala-2.11 property set. Finally build/mvn does

Re: Spark on scala 2.11 build fails due to incorrect jline dependency in REPL

2015-08-17 Thread Ted Yu
You were building against 1.4.x, right ? In master branch, switch-to-scala-2.11.sh is gone. There is scala-2.11 profile. FYI On Sun, Aug 16, 2015 at 11:12 AM, Stephen Boesch java...@gmail.com wrote: I am building spark with the following options - most notably the **scala-2.11**: .

Re: Spark on scala 2.11 build fails due to incorrect jline dependency in REPL

2015-08-17 Thread Stephen Boesch
In 1.4 it is change-scala-version.sh 2.11 But the problem was it is a -Dscala-211 not a -P. I misread the doc's. 2015-08-17 14:17 GMT-07:00 Ted Yu yuzhih...@gmail.com: You were building against 1.4.x, right ? In master branch, switch-to-scala-2.11.sh is gone. There is scala-2.11

Spark on scala 2.11 build fails due to incorrect jline dependency in REPL

2015-08-16 Thread Stephen Boesch
I am building spark with the following options - most notably the **scala-2.11**: . dev/switch-to-scala-2.11.sh mvn -Phive -Pyarn -Phadoop-2.6 -Dhadoop2.6.2 -Pscala-2.11 -DskipTests -Dmaven.javadoc.skip=true clean package The build goes pretty far but fails in one of the minor modules

Re: Finding moving average using Spark and Scala

2015-07-14 Thread Feynman Liang
example is RegressionMetrics https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala#L48's use of of OnlineMultivariateSummarizer to aggregate statistics across labels and residuals; take a look at how aggregateByKey is used

Re: Finding moving average using Spark and Scala

2015-07-13 Thread Anupam Bagchi
wrote: A good example is RegressionMetrics https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala#L48's use of of OnlineMultivariateSummarizer to aggregate statistics across labels and residuals; take a look at how

Re: Finding moving average using Spark and Scala

2015-07-13 Thread Feynman Liang
saveAsTextFile() on it. Anupam Bagchi (c) 408.431.0780 (h) 408-873-7909 On Jul 13, 2015, at 4:52 PM, Feynman Liang fli...@databricks.com wrote: A good example is RegressionMetrics https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation

Re: Finding moving average using Spark and Scala

2015-07-13 Thread Anupam Bagchi
: A good example is RegressionMetrics https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala#L48's use of of OnlineMultivariateSummarizer to aggregate statistics across labels and residuals; take a look at how aggregateByKey

Finding moving average using Spark and Scala

2015-07-13 Thread Anupam Bagchi
I have to do the following tasks on a dataset using Apache Spark with Scala as the programming language: - Read the dataset from HDFS. A few sample lines look like this: deviceid,bytes,eventdate 15590657,246620,20150630 14066921,1907,20150621 14066921,1906,20150626 6522013,2349,20150626

Re: Finding moving average using Spark and Scala

2015-07-13 Thread Feynman Liang
, Jul 13, 2015 at 10:07 AM, Anupam Bagchi anupam_bag...@rocketmail.com wrote: I have to do the following tasks on a dataset using Apache Spark with Scala as the programming language: 1. Read the dataset from HDFS. A few sample lines look like this: deviceid,bytes

Re: Finding moving average using Spark and Scala

2015-07-13 Thread Feynman Liang
A good example is RegressionMetrics https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala#L48's use of of OnlineMultivariateSummarizer to aggregate statistics across labels and residuals; take a look at how aggregateByKey is used

Re: Finding moving average using Spark and Scala

2015-07-13 Thread Anupam Bagchi
into an array/tuple using a mapValues first and then write. On Mon, Jul 13, 2015 at 10:07 AM, Anupam Bagchi anupam_bag...@rocketmail.com mailto:anupam_bag...@rocketmail.com wrote: I have to do the following tasks on a dataset using Apache Spark with Scala as the programming language: Read

Calculating moving average of dataset in Apache Spark and Scala

2015-07-12 Thread Anupam Bagchi
I have to do the following tasks on a dataset using Apache Spark with Scala as the programming language: Read the dataset from HDFS. A few sample lines look like this: deviceid,bytes,eventdate 15590657,246620,20150630 14066921,1907,20150621 14066921,1906,20150626 6522013,2349,20150626

Moving average using Spark and Scala

2015-07-12 Thread Anupam Bagchi
I have to do the following tasks on a dataset using Apache Spark with Scala as the programming language: Read the dataset from HDFS. A few sample lines look like this: deviceid,bytes,eventdate 15590657,246620,20150630 14066921,1907,20150621 14066921,1906,20150626 6522013,2349,20150626

Re: External JARs not loading Spark Shell Scala 2.11

2015-04-17 Thread Michael Allman
some serious stability problems simply trying to run the Spark 1.3 Scala 2.11 REPL. Most of the time it fails to load and spews a torrent of compiler assertion failures, etc. See attached.spark@dp-cluster-master-node-001:~/spark/bin$ spark-shell Spark Command: java -cp /opt/spark/conf:/opt/spark/lib

Re: External JARs not loading Spark Shell Scala 2.11

2015-04-17 Thread Sean Owen
on 2.11 is here: http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211 Also, I'm experiencing some serious stability problems simply trying to run the Spark 1.3 Scala 2.11 REPL. Most of the time it fails to load and spews a torrent of compiler assertion failures, etc. See

Re: External JARs not loading Spark Shell Scala 2.11

2015-04-17 Thread Michael Allman
to run the Spark 1.3 Scala 2.11 REPL. Most of the time it fails to load and spews a torrent of compiler assertion failures, etc. See attached. Unfortunately, it appears the Spark 1.3 Scala 2.11 REPL is simply not ready for production use. I was going to file a bug, but it seems clear

Re: External JARs not loading Spark Shell Scala 2.11

2015-04-17 Thread Sean Owen
/latest/building-spark.html#building-for-scala-211 Also, I'm experiencing some serious stability problems simply trying to run the Spark 1.3 Scala 2.11 REPL. Most of the time it fails to load and spews a torrent of compiler assertion failures, etc. See attached. Unfortunately, it appears

Re: External JARs not loading Spark Shell Scala 2.11

2015-04-17 Thread Michael Allman
trying to run the Spark 1.3 Scala 2.11 REPL. Most of the time it fails to load and spews a torrent of compiler assertion failures, etc. See attached. Unfortunately, it appears the Spark 1.3 Scala 2.11 REPL is simply not ready for production use. I was going to file a bug, but it seems

Re: External JARs not loading Spark Shell Scala 2.11

2015-04-09 Thread Alex Nakos
-2988 Any help in getting this working would be much appreciated! Thanks Alex On Thu, Apr 9, 2015 at 11:32 AM, Prashant Sharma scrapco...@gmail.com wrote: You are right this needs to be done. I can work on it soon, I was not sure if there is any one even using scala 2.11 spark repl. Actually

Re: External JARs not loading Spark Shell Scala 2.11

2015-04-09 Thread Alex Nakos
was not sure if there is any one even using scala 2.11 spark repl. Actually there is a patch in scala 2.10 shell to support adding jars (Lost the JIRA ID), which has to be ported for scala 2.11 too. If however, you(or anyone else) are planning to work, I can help you ? Prashant Sharma On Thu

Re: External JARs not loading Spark Shell Scala 2.11

2015-04-09 Thread Prashant Sharma
You are right this needs to be done. I can work on it soon, I was not sure if there is any one even using scala 2.11 spark repl. Actually there is a patch in scala 2.10 shell to support adding jars (Lost the JIRA ID), which has to be ported for scala 2.11 too. If however, you(or anyone else

Re: Running Spark from Scala source files other than main file

2015-03-11 Thread Imran Rashid
did you forget to specify the main class w/ --class Main? though if that was it, you should at least see *some* error message, so I'm confused myself ... On Wed, Mar 11, 2015 at 6:53 AM, Aung Kyaw Htet akh...@gmail.com wrote: Hi Everyone, I am developing a scala app, in which the main object

Re: How to pass parameters to a spark-jobserver Scala class?

2015-02-19 Thread Vasu C
by Twitter. Regards, Vasu C -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-pass-parameters-to-a-spark-jobserver-Scala-class-tp21671p21727.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: How to pass parameters to a spark-jobserver Scala class?

2015-02-18 Thread Vasu C
-to-a-spark-jobserver-Scala-class-tp21671p21695.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: How to pass parameters to a spark-jobserver Scala class?

2015-02-18 Thread Sasi
in *curl -d ...*) during job run? Hope I make sense. Sasi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-pass-parameters-to-a-spark-jobserver-Scala-class-tp21671p21717.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: How to pass parameters to a spark-jobserver Scala class?

2015-02-17 Thread Vasu C
://github.com/spark-jobserver/spark-jobserver Regards, Vasu C -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-pass-parameters-to-a-spark-jobserver-Scala-class-tp21671p21692.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark and Scala

2014-09-13 Thread Deep Pradhan
pradhandeep1...@gmail.com wrote: I know that unpersist is a method on RDD. But my confusion is that, when we port our Scala programs to Spark, doesn't everything change to RDDs? On Fri, Sep 12, 2014 at 10:16 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: unpersist is a method

Re: Spark and Scala

2014-09-13 Thread Mark Hamstra
, Deep Pradhan pradhandeep1...@gmail.com wrote: I know that unpersist is a method on RDD. But my confusion is that, when we port our Scala programs to Spark, doesn't everything change to RDDs? On Fri, Sep 12, 2014 at 10:16 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: unpersist

Re: Spark and Scala

2014-09-13 Thread Deep Pradhan
that unpersist is a method on RDD. But my confusion is that, when we port our Scala programs to Spark, doesn't everything change to RDDs? On Fri, Sep 12, 2014 at 10:16 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: unpersist is a method on RDDs. RDDs are abstractions introduced

Re: Spark and Scala

2014-09-13 Thread Mark Hamstra
confusion is that, when we port our Scala programs to Spark, doesn't everything change to RDDs? On Fri, Sep 12, 2014 at 10:16 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: unpersist is a method on RDDs. RDDs are abstractions introduced by Spark. An Int is just a Scala Int. You can't call

Spark and Scala

2014-09-12 Thread Deep Pradhan
There is one thing that I am confused about. Spark has codes that have been implemented in Scala. Now, can we run any Scala code on the Spark framework? What will be the difference in the execution of the scala code in normal systems and on Spark? The reason for my question is the following: I had

Re: Spark and Scala

2014-09-12 Thread Nicholas Chammas
unpersist is a method on RDDs. RDDs are abstractions introduced by Spark. An Int is just a Scala Int. You can't call unpersist on Int in Scala, and that doesn't change in Spark. On Fri, Sep 12, 2014 at 12:33 PM, Deep Pradhan pradhandeep1...@gmail.com wrote: There is one thing that I am

Re: Spark and Scala

2014-09-12 Thread Deep Pradhan
I know that unpersist is a method on RDD. But my confusion is that, when we port our Scala programs to Spark, doesn't everything change to RDDs? On Fri, Sep 12, 2014 at 10:16 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: unpersist is a method on RDDs. RDDs are abstractions introduced

Re: Spark and Scala

2014-09-12 Thread Hari Shreedharan
compilation etc). On Fri, Sep 12, 2014 at 8:04 PM, Deep Pradhan pradhandeep1...@gmail.com wrote: I know that unpersist is a method on RDD. But my confusion is that, when we port our Scala programs to Spark, doesn't everything change to RDDs? On Fri, Sep 12, 2014 at 10:16 PM, Nicholas Chammas

Re: Spark and Scala

2014-09-12 Thread Deep Pradhan
port our Scala programs to Spark, doesn't everything change to RDDs? On Fri, Sep 12, 2014 at 10:16 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: unpersist is a method on RDDs. RDDs are abstractions introduced by Spark. An Int is just a Scala Int. You can't call unpersist on Int

Re: Spark and Scala

2014-09-12 Thread Soumya Simanta
compilation etc). On Fri, Sep 12, 2014 at 8:04 PM, Deep Pradhan pradhandeep1...@gmail.com wrote: I know that unpersist is a method on RDD. But my confusion is that, when we port our Scala programs to Spark, doesn't everything change to RDDs? On Fri, Sep 12, 2014 at 10:16 PM, Nicholas

Re: Running spark examples/scala scripts

2014-03-19 Thread Pariksheet Barapatre
:-) Thanks for suggestion. I was actually asking how to run spark scripts as a standalone App. I am able to run Java code and Python code as standalone app. one more doubt, documentation says - to read HDFS file, we need to add dependency groupIdorg.apache.hadoop/groupId