RE: newbie question for reduce

2022-01-27 Thread Christopher Robson
the error that you are seeing. There are several ways you could fix it. One way is to use a map before the reduce, e.g. rdd..map(lambda x: x[1]).reduce(lambda x,y: x + y) Hope that's helpful, Chris -Original Message- From: capitnfrak...@free.fr Sent: 19 January 2022 02:41 To: user@sp

Re: newbie question for reduce

2022-01-18 Thread Sean Owen
The problem is that you are reducing a list of tuples, but you are producing an int. The resulting int can't be combined with other tuples with your function. reduce() has to produce the same type as its arguments. rdd.map(lambda x: x[1]).reduce(lambda x,y: x+y) ... would work On Tue, Jan 18,

newbie question for reduce

2022-01-18 Thread capitnfrakass
Hello Please help take a look why my this simple reduce doesn't work? rdd = sc.parallelize([("a",1),("b",2),("c",3)]) rdd.reduce(lambda x,y: x[1]+y[1]) Traceback (most recent call last): File "", line 1, in File "/opt/spark/python/pyspark/rdd.py", line 1001, in reduce return

Re: Spark Newbie question

2019-07-11 Thread infa elance
Thanks Jerry for the clarification. Ajay. On Thu, Jul 11, 2019 at 12:48 PM Jerry Vinokurov wrote: > Hi Ajay, > > When a Spark SQL statement references a table, that table has to be > "registered" first. Usually the way this is done is by reading in a > DataFrame, then calling the

Re: Spark Newbie question

2019-07-11 Thread Jerry Vinokurov
Hi Ajay, When a Spark SQL statement references a table, that table has to be "registered" first. Usually the way this is done is by reading in a DataFrame, then calling the createOrReplaceTempView (or one of a few other functions) on that data frame, with the argument being the name under which

Re: Spark Newbie question

2019-07-11 Thread infa elance
Sorry, i guess i hit the send button too soon This question is regarding a spark stand-alone cluster. My understanding is spark is an execution engine and not a storage layer. Spark processes data in memory but when someone refers to a spark table created through sparksql(df/rdd) what exactly

Spark Newbie question

2019-07-11 Thread infa elance
This is stand-alone spark cluster. My understanding is spark is an execution engine and not a storage layer. Spark processes data in memory but when someone refers to a spark table created through sparksql(df/rdd) what exactly are they referring to? Could it be a Hive table? If yes, is it the

Re: Newbie question on how to extract column value

2018-08-07 Thread James Starks
Because of some legacy issues I can't immediately upgrade spark version. But I try filter data before loading it into spark based on the suggestion by val df = sparkSession.read.format("jdbc").option(...).option("dbtable", "(select .. from ... where url <> '') table_name")load()

Re: Newbie question on how to extract column value

2018-08-07 Thread Gourav Sengupta
Hi James, It is always advisable to use the latest SPARK version. That said, can you please giving a try to dataframes and udf if possible. I think, that would be a much scalable way to address the issue. Also in case possible, it is always advisable to use the filter option before fetching the

Newbie question on how to extract column value

2018-08-07 Thread James Starks
I am very new to Spark. Just successfully setup Spark SQL connecting to postgresql database, and am able to display table with code sparkSession.sql("SELECT id, url from table_a where col_b <> '' ").show() Now I want to perform filter and map function on col_b value. In plain scala it

Re: newbie question about RDD

2016-11-22 Thread Mohit Durgapal
Hi Raghav, Please refer to the following code: SparkConf sparkConf = new SparkConf().setMaster("local[2]").setAppName("PersonApp"); //creating java spark context JavaSparkContext sc = new JavaSparkContext(sparkConf); //reading file from hfs into spark rdd , the name node is localhost JavaRDD

Re: newbie question about RDD

2016-11-21 Thread Raghav
Sorry I forgot to ask how can I use spark context here ? I have hdfs directory path of the files, as well as the name node of hdfs cluster. Thanks for your help. On Mon, Nov 21, 2016 at 9:45 PM, Raghav wrote: > Hi > > I am extremely new to Spark. I have to read a file

newbie question about RDD

2016-11-21 Thread Raghav
Hi I am extremely new to Spark. I have to read a file form HDFS, and get it in memory in RDD format. I have a Java class as follows: class Person { private long UUID; private String FirstName; private String LastName; private String zip; // public methods } The file in

Re: Newbie question - Best way to bootstrap with Spark

2016-11-14 Thread Jon Gregg
rk. > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark- > tp28032p28069.html > Sent from the Apache Spark

Re: Newbie question - Best way to bootstrap with Spark

2016-11-14 Thread Rishikesh Teke
Integrate spark with apache zeppelin https://zeppelin.apache.org/ <https://zeppelin.apache.org/> its again a very handy way to bootstrap with spark. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with

Re: Newbie question - Best way to bootstrap with Spark

2016-11-10 Thread jggg777
MapReduce cluster with Spark pre-installed, but you'll need to sign up for an AWS account. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032p28061.html Sent from the Apache Spark User List mailing list

Re: Newbie question - Best way to bootstrap with Spark

2016-11-07 Thread Raghav
ap Reduce but have not had a chance to get my > hands dirty. There are tons of resources for Spark, but I am looking for > some guidance for starter material, or videos. > > Thanks. > > Raghav > > > > -- > View this message in context: http://apache-spark-user-lis

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Denny Lee
ave some understanding of Map Reduce but have not had a chance to get my > hands dirty. There are tons of resources for Spark, but I am looking for > some guidance for starter material, or videos. > > Thanks. > > Raghav > > > > -- > View this message in context: > http:

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Raghav
gt; hands dirty. There are tons of resources for Spark, but I am looking for >> some guidance for starter material, or videos. >> >> Thanks. >> >> Raghav >> >> >> >> -- >> View this message in context: http://apache-spark-user-list. >> 10

Re: Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread warmb...@qq.com
.com From: ayan guha Date: 2016-11-07 10:08 To: raghav CC: user Subject: Re: Newbie question - Best way to bootstrap with Spark I would start with Spark documentation, really. Then you would probably start with some older videos from youtube, especially spark summit 2014,2015 and 2016 videos. Rega

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread ayan guha
p://apache-spark-user-list. > 1001560.n3.nabble.com/Newbie-question-Best-way-to- > bootstrap-with-Spark-tp28032.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e

Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread raghav
for some guidance for starter material, or videos. Thanks. Raghav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Tristan Nixon
t;> Hi >> >> Any help appreciated on this. I am trying to write a Spark program using >> IntelliJ. I get a run time error as soon as new SparkConf() is called from >> main. Top few lines of the exception are pasted below. >> >> These are the following versions: &g

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Vasu Parameswaran
ted on this. I am trying to write a Spark program >>>> using >>>> IntelliJ. I get a run time error as soon as new SparkConf() is called >>>> from >>>> main. Top few lines of the exception are pasted below. >>>> >>>> These ar

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Tristan Nixon
1 > 1.6.0 > > I have installed the Scala plugin in IntelliJ and added a dependency. > > I have also added a library dependency in the project structure. > > Thanks for any help! > > Vasu > > > Exception in thread "main" java.lang.NoSuchMethodError: > scala.P

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Jacek Laskowski
gt;> I have installed the Scala plugin in IntelliJ and added a dependency. >>> >>> I have also added a library dependency in the project structure. >>> >>> Thanks for any help! >>> >>> Vasu >>> >>> >>> Exception in thread "main&

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Vasu Parameswaran
at org.apache.spark.util.Utils$.(Utils.scala:1682) >> at org.apache.spark.util.Utils$.(Utils.scala) >> at org.apache.spark.SparkConf.(SparkConf.scala:59) >> >> >> >> >> >> >> -- >> View this message in context: >> http:/

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Vasu Parameswaran
scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String; >> at org.apache.spark.util.Utils$.(Utils.scala:1682) >> at org.apache.spark.util.Utils$.(Utils.scala) >> at org.apache.spark.SparkConf.(SparkConf.scala:59) >> >> >> >> >> >>

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Jacek Laskowski
ava/lang/String;)Ljava/lang/String; > at org.apache.spark.util.Utils$.(Utils.scala:1682) > at org.apache.spark.util.Utils$.(Utils.scala) > at org.apache.spark.SparkConf.(SparkConf.scala:59) > > > > > > > -- > View this message in context: &g

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Ted Yu
org.apache.spark.util.Utils$.(Utils.scala:1682) > at org.apache.spark.util.Utils$.(Utils.scala) > at org.apache.spark.SparkConf.(SparkConf.scala:59) > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/

Newbie question - Help with runtime error on augmentString

2016-03-11 Thread vasu20
Ljava/lang/String;)Ljava/lang/String; at org.apache.spark.util.Utils$.(Utils.scala:1682) at org.apache.spark.util.Utils$.(Utils.scala) at org.apache.spark.SparkConf.(SparkConf.scala:59) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.

Re: Newbie question

2016-01-07 Thread Deepak Sharma
Yes , you can do it unless the method is marked static/final. Most of the methods in SparkContext are marked static so you can't over ride them definitely , else over ride would work usually. Thanks Deepak On Fri, Jan 8, 2016 at 12:06 PM, yuliya Feldman wrote: >

Re: Newbie question

2016-01-07 Thread censj
You can try it. > 在 2016年1月8日,14:44,yuliya Feldman 写道: > > invoked

Newbie question

2016-01-07 Thread yuliya Feldman
Hello, I am new to Spark and have a most likely basic question - can I override a method from SparkContext? Thanks

Re: Newbie question

2016-01-07 Thread yuliya Feldman
lt;yufeld...@yahoo.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Sent: Thursday, January 7, 2016 10:38 PM Subject: Re: Newbie question why to override a method from SparkContext? 在 2016年1月8日,14:36,yuliya Feldman <yufeld...@yahoo.com.INVALID> 写道: Hello, I am new to

Re: Newbie question

2016-01-07 Thread yuliya Feldman
Thank you From: Deepak Sharma <deepakmc...@gmail.com> To: yuliya Feldman <yufeld...@yahoo.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Sent: Thursday, January 7, 2016 10:41 PM Subject: Re: Newbie question Yes , you can do it unless

Re: Newbie question

2016-01-07 Thread dEEPU
If the method is not final or static then u can On Jan 8, 2016 12:07 PM, yuliya Feldman wrote: Hello, I am new to Spark and have a most likely basic question - can I override a method from SparkContext? Thanks

Spark ML/MLib newbie question

2015-10-19 Thread George Paulson
-list.1001560.n3.nabble.com/Spark-ML-MLib-newbie-question-tp25129.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e

Re: Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Hien Luu
This blog outlines a few things that make Spark faster than MapReduce - https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html On Fri, Aug 7, 2015 at 9:13 AM, Muler mulugeta.abe...@gmail.com wrote: Consider the classic word count application over a 4 node cluster with a sizable

Re: Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Corey Nolet
1) Spark only needs to shuffle when data needs to be partitioned around the workers in an all-to-all fashion. 2) Multi-stage jobs that would normally require several map reduce jobs, thus causing data to be dumped to disk between the jobs can be cached in memory.

Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Muler
Consider the classic word count application over a 4 node cluster with a sizable working data. What makes Spark ran faster than MapReduce considering that Spark also has to write to disk during shuffle?

Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Muler
Thanks! On Wed, Aug 5, 2015 at 5:24 PM, Saisai Shao sai.sai.s...@gmail.com wrote: Yes, finally shuffle data will be written to disk for reduce stage to pull, no matter how large you set to shuffle memory fraction. Thanks Saisai On Thu, Aug 6, 2015 at 7:50 AM, Muler

Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Muler
Hi, Consider I'm running WordCount with 100m of data on 4 node cluster. Assuming my RAM size on each node is 200g and i'm giving my executors 100g (just enough memory for 100m data) 1. If I have enough memory, can Spark 100% avoid writing to disk? 2. During shuffle, where results have to

Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Saisai Shao
Hi Muler, Shuffle data will be written to disk, no matter how large memory you have, large memory could alleviate shuffle spill where temporary file will be generated if memory is not enough. Yes, each node writes shuffle data to file and pulled from disk in reduce stage from network framework

Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Muler
thanks, so if I have enough large memory (with enough spark.shuffle.memory) then shuffle (in-memory shuffle) spill doesn't happen (per node) but still shuffle data has to be ultimately written to disk so that reduce stage pulls if across network? On Wed, Aug 5, 2015 at 4:40 PM, Saisai Shao

Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Saisai Shao
Yes, finally shuffle data will be written to disk for reduce stage to pull, no matter how large you set to shuffle memory fraction. Thanks Saisai On Thu, Aug 6, 2015 at 7:50 AM, Muler mulugeta.abe...@gmail.com wrote: thanks, so if I have enough large memory (with enough spark.shuffle.memory)

Re: MLlib/kmeans newbie question(s)

2015-03-09 Thread Xiangrui Meng
You need to change `== 1` to `== i`. `println(t)` happens on the workers, which may not be what you want. Try the following: noSets.filter(t = model.predict(Utils.featurize(t)) == i).collect().foreach(println) -Xiangrui On Sat, Mar 7, 2015 at 3:20 PM, Pierce Lamb richard.pierce.l...@gmail.com

MLlib/kmeans newbie question(s)

2015-03-07 Thread Pierce Lamb
Hi all, I'm very new to machine learning algorithms and Spark. I'm follow the Twitter Streaming Language Classifier found here: http://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/README.html Specifically this code:

Re: Newbie Question on How Tasks are Executed

2015-01-19 Thread davidkl
Hello Mixtou, if you want to look at partition ID, I believe you want to use mapPartitionsWithIndex -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-Question-on-How-Tasks-are-Executed-tp21064p21228.html Sent from the Apache Spark User List mailing

Newbie Question on How Tasks are Executed

2015-01-09 Thread mixtou
{ System.out.println(Guaranteed Word : +tuple._1+ with count: +tuple._2(0)+ and error: +tuple._2(1)); } } } } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-Question-on-How-Tasks-are-Executed-tp21064.html Sent from the Apache Spark User List mailing list

A spark newbie question

2015-01-04 Thread Dinesh Vallabhdas
A spark cassandra newbie question. Thanks in advance for the help.I have a cassandra table with 2 columns message_timestamp(timestamp) and  message_type(text). The data is of the form2014-06-25 12:01:39 START 2014-06-25 12:02:39 START 2014-06-25 12:02:39 PAUSE 2014-06-25 14:02:39 STOP 2014-06-25

A spark newbie question on summary statistics

2015-01-04 Thread anondin
A spark cassandra newbie question. Appreciate the help.u...@host.com I have a cassandra table with 2 columns message_timestamp(timestamp) and message_type(text). The data is of the form 2014-06-25 12:01:39 START 2014-06-25 12:02:39 START 2014-06-25 12:02:39 PAUSE 2014-06-25 14:02:39 STOP 2014-06

Re: A spark newbie question

2015-01-04 Thread Aniket Bhatnagar
Go through spark API documentation. Basically you have to do group by (date, message_type) and then do a count. On Sun, Jan 4, 2015, 9:58 PM Dinesh Vallabhdas dines...@yahoo.com.invalid wrote: A spark cassandra newbie question. Thanks in advance for the help. I have a cassandra table with 2

Re: A spark newbie question

2015-01-04 Thread Sanjay Subramanian
as the language and use Spark. Its exciting.   regards sanjay From: Aniket Bhatnagar aniket.bhatna...@gmail.com To: Dinesh Vallabhdas dines...@yahoo.com; user@spark.apache.org user@spark.apache.org Sent: Sunday, January 4, 2015 11:07 AM Subject: Re: A spark newbie question Go through

Newbie Question

2014-12-11 Thread Fernando O.
Hi guys, I'm planning to use spark on a project and I'm facing a problem, I couldn't find a log that explains what's wrong with what I'm doing. I have 2 vms that run a small hadoop (2.6.0) cluster. I added a file that has a 50 lines of json data Compiled spark, all tests passed, I run some

newbie question quickstart example sbt issue

2014-10-28 Thread nl19856
? Regards Hans-Peter -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/newbie-question-quickstart-example-sbt-issue-tp17477.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: newbie question quickstart example sbt issue

2014-10-28 Thread Yanbo Liang
-project:simple-project_2.10:1.0 sbt.ResolveException: unresolved dependency: org.apache.spark#spark-core_2.10;1.1.0: not found What am I doing wrong? Regards Hans-Peter -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/newbie-question-quickstart-example-sbt-issue

Re: newbie question quickstart example sbt issue

2014-10-28 Thread nl19856
-core_2.10;1.1.0: not found What am I doing wrong? Regards Hans-Peter -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/newbie-question-quickstart-example-sbt-issue-tp17477.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: newbie question quickstart example sbt issue

2014-10-28 Thread Akhil Das
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/newbie-question-quickstart-example-sbt-issue-tp17477.html Sent from the Apache Spark User List mailing list archive at Nabble.com

JDBC Connections / newbie question

2014-07-20 Thread Ahmed Ibrahim
Hi All, In a JAVA based scenario where we have a large Oracle DB and want to use spark to do some distributed analysis being done on the data -- in such case how exactly we go about defining a JDBC connection and querying the data thanks, -- Ahmed Osama Ibrahim ITSC International