RE: newbie question for reduce

2022-01-27 Thread Christopher Robson
Hi,

The reduce lambda accepts as its first argument the return value of the 
previous execution. The first time, it is invoked with:
x = ("a", 1), y = ("b", 2)
And returns 1+2=3
Second time, it is invoked with
x = 3, y = ("c", 3)
so you can see why it raises the error that you are seeing.

There are several ways you could fix it. One way is to use a map before the 
reduce, e.g. 
rdd..map(lambda x: x[1]).reduce(lambda x,y: x + y)

Hope that's helpful,

Chris

-Original Message-
From: capitnfrak...@free.fr  
Sent: 19 January 2022 02:41
To: user@spark.apache.org
Subject: newbie question for reduce

Hello

Please help take a look why my this simple reduce doesn't work?

>>> rdd = sc.parallelize([("a",1),("b",2),("c",3)])
>>> 
>>> rdd.reduce(lambda x,y: x[1]+y[1])
Traceback (most recent call last):
   File "", line 1, in 
   File "/opt/spark/python/pyspark/rdd.py", line 1001, in reduce
 return reduce(f, vals)
   File "/opt/spark/python/pyspark/util.py", line 74, in wrapper
 return f(*args, **kwargs)
   File "", line 1, in 
TypeError: 'int' object is not subscriptable
>>> 


spark 3.2.0

Thank you.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: newbie question for reduce

2022-01-18 Thread Sean Owen
The problem is that you are reducing a list of tuples, but you are
producing an int. The resulting int can't be combined with other tuples
with your function. reduce() has to produce the same type as its arguments.
rdd.map(lambda x: x[1]).reduce(lambda x,y: x+y)
... would work

On Tue, Jan 18, 2022 at 8:41 PM  wrote:

> Hello
>
> Please help take a look why my this simple reduce doesn't work?
>
> >>> rdd = sc.parallelize([("a",1),("b",2),("c",3)])
> >>>
> >>> rdd.reduce(lambda x,y: x[1]+y[1])
> Traceback (most recent call last):
>File "", line 1, in 
>File "/opt/spark/python/pyspark/rdd.py", line 1001, in reduce
>  return reduce(f, vals)
>File "/opt/spark/python/pyspark/util.py", line 74, in wrapper
>  return f(*args, **kwargs)
>File "", line 1, in 
> TypeError: 'int' object is not subscriptable
> >>>
>
>
> spark 3.2.0
>
> Thank you.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Newbie question on how to extract column value

2018-08-07 Thread James Starks
Because of some legacy issues I can't immediately upgrade spark version. But I 
try filter data before loading it into spark based on the suggestion by

 val df = sparkSession.read.format("jdbc").option(...).option("dbtable", 
"(select .. from ... where url <> '') table_name")load()
 df.createOrReplaceTempView("new_table")

Then perform custom operation do the trick.

sparkSession.sql("select id, url from new_table").as[(String, String)].map 
{ case (id, url) =>
   val derived_data = ... // operation on url
   (id, derived_data)
}.show()

Thanks for the advice, it's really helpful!

‐‐‐ Original Message ‐‐‐
On August 7, 2018 5:33 PM, Gourav Sengupta  wrote:

> Hi James,
>
> It is always advisable to use the latest SPARK version. That said, can you 
> please giving a try to dataframes and udf if possible. I think, that would be 
> a much scalable way to address the issue.
>
> Also in case possible, it is always advisable to use the filter option before 
> fetching the data to Spark.
>
> Thanks and Regards,
> Gourav
>
> On Tue, Aug 7, 2018 at 4:09 PM, James Starks  
> wrote:
>
>> I am very new to Spark. Just successfully setup Spark SQL connecting to 
>> postgresql database, and am able to display table with code
>>
>> sparkSession.sql("SELECT id, url from table_a where col_b <> '' ").show()
>>
>> Now I want to perform filter and map function on col_b value. In plain scala 
>> it would be something like
>>
>> Seq((1, "http://a.com/a;), (2, "http://b.com/b;), (3, "unknown")).filter 
>> { case (_, url) => isValid(url) }.map { case (id, url)  => (id, pathOf(url)) 
>> }
>>
>> where filter will remove invalid url, and then map (id, url) to (id, path of 
>> url).
>>
>> However, when applying this concept to spark sql with code snippet
>>
>> sparkSession.sql("...").filter(isValid($"url"))
>>
>> Compiler complains type mismatch because $"url" is ColumnName type. How can 
>> I extract column value i.e. http://... for the column url in order to 
>> perform filter function?
>>
>> Thanks
>>
>> Java 1.8.0
>> Scala 2.11.8
>> Spark 2.1.0

Re: Newbie question on how to extract column value

2018-08-07 Thread Gourav Sengupta
Hi James,

It is always advisable to use the latest SPARK version. That said, can you
please giving a try to dataframes and udf if possible. I think, that would
be a much scalable way to address the issue.

Also in case possible, it is always advisable to use the filter option
before fetching the data to Spark.


Thanks and Regards,
Gourav

On Tue, Aug 7, 2018 at 4:09 PM, James Starks  wrote:

> I am very new to Spark. Just successfully setup Spark SQL connecting to
> postgresql database, and am able to display table with code
>
> sparkSession.sql("SELECT id, url from table_a where col_b <> ''
> ").show()
>
> Now I want to perform filter and map function on col_b value. In plain
> scala it would be something like
>
> Seq((1, "http://a.com/a;), (2, "http://b.com/b;), (3,
> "unknown")).filter { case (_, url) => isValid(url) }.map { case (id, url)
> => (id, pathOf(url)) }
>
> where filter will remove invalid url, and then map (id, url) to (id, path
> of url).
>
> However, when applying this concept to spark sql with code snippet
>
> sparkSession.sql("...").filter(isValid($"url"))
>
> Compiler complains type mismatch because $"url" is ColumnName type. How
> can I extract column value i.e. http://... for the column url in order to
> perform filter function?
>
> Thanks
>
> Java 1.8.0
> Scala 2.11.8
> Spark 2.1.0
>
>
>
>
>
>


Re: newbie question about RDD

2016-11-22 Thread Mohit Durgapal
Hi Raghav,

Please refer to the following code:

SparkConf sparkConf = new
SparkConf().setMaster("local[2]").setAppName("PersonApp");

//creating java spark context

JavaSparkContext sc = new JavaSparkContext(sparkConf);

//reading file from hfs into spark rdd , the name node is localhost
JavaRDD personStringRDD =
sc.textFile("hdfs://localhost:9000/custom/inputPersonFile.txt");


//Converting from String RDD to Person RDD ...this is just an example,
you can replace the parsing with a better exception handled code

JavaRDD personObjectRDD = personStringRDD.map(personRow -> {
String[] personValues = personRow.split("\t");

return new Person(Long.parseLong(personValues[0]),
personValues[1], personValues[2],
personValues[3]);
});

//finally just printing the count of objects
System.out.println("Person count = "+personObjectRDD.count());


Regards
Mohit


On Tue, Nov 22, 2016 at 11:17 AM, Raghav  wrote:

> Sorry I forgot to ask how can I use spark context here ? I have hdfs
> directory path of the files, as well as the name node of hdfs cluster.
>
> Thanks for your help.
>
> On Mon, Nov 21, 2016 at 9:45 PM, Raghav  wrote:
>
>> Hi
>>
>> I am extremely new to Spark. I have to read a file form HDFS, and get it
>> in memory  in RDD format.
>>
>> I have a Java class as follows:
>>
>> class Person {
>> private long UUID;
>> private String FirstName;
>> private String LastName;
>> private String zip;
>>
>>// public methods
>> }
>>
>> The file in HDFS is as follows:
>>
>> UUID. FirstName LastName Zip
>> 7462   John Doll06903
>> 5231   Brad Finley 32820
>>
>>
>> Can someone point me how to get a JavaRDD object by reading the
>> file in HDFS ?
>>
>> Thanks.
>>
>> --
>> Raghav
>>
>
>
>
> --
> Raghav
>


Re: newbie question about RDD

2016-11-21 Thread Raghav
Sorry I forgot to ask how can I use spark context here ? I have hdfs
directory path of the files, as well as the name node of hdfs cluster.

Thanks for your help.

On Mon, Nov 21, 2016 at 9:45 PM, Raghav  wrote:

> Hi
>
> I am extremely new to Spark. I have to read a file form HDFS, and get it
> in memory  in RDD format.
>
> I have a Java class as follows:
>
> class Person {
> private long UUID;
> private String FirstName;
> private String LastName;
> private String zip;
>
>// public methods
> }
>
> The file in HDFS is as follows:
>
> UUID. FirstName LastName Zip
> 7462   John Doll06903
> 5231   Brad Finley 32820
>
>
> Can someone point me how to get a JavaRDD object by reading the
> file in HDFS ?
>
> Thanks.
>
> --
> Raghav
>



-- 
Raghav


Re: Newbie question - Best way to bootstrap with Spark

2016-11-14 Thread Jon Gregg
Piggybacking off this - how are you guys teaching DataFrames and Datasets
to new users?  I haven't taken the edx courses but I don't see Spark SQL
covered heavily in the syllabus.  I've dug through the Databricks
documentation but it's a lot of information for a new user I think - hoping
there is a video or course option instead.

On Mon, Nov 14, 2016 at 11:13 AM, Rishikesh Teke 
wrote:

> Integrate spark with apache zeppelin  https://zeppelin.apache.org/
>    its again a very handy way to bootstrap
> with spark.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-
> tp28032p28069.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Newbie question - Best way to bootstrap with Spark

2016-11-14 Thread Rishikesh Teke
Integrate spark with apache zeppelin  https://zeppelin.apache.org/
   its again a very handy way to bootstrap
with spark.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032p28069.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Newbie question - Best way to bootstrap with Spark

2016-11-10 Thread jggg777
A couple options:

(1) You can start locally by downloading Spark to your laptop:
http://spark.apache.org/downloads.html , then jump into the Quickstart docs:
http://spark.apache.org/docs/latest/quick-start.html

(2) There is a free Databricks community edition that runs on AWS:
https://databricks.com/try-databricks .  The databricks docs are publicly
available and have tutorial notebooks:
https://docs.cloud.databricks.com/docs/latest/databricks_guide/index.html

If you want to run it on a several node cluster for bigger data, it's pretty
easy through the AWS console to spin up an Elastic MapReduce cluster with
Spark pre-installed, but you'll need to sign up for an AWS account.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032p28061.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Newbie question - Best way to bootstrap with Spark

2016-11-07 Thread Raghav
Thanks a ton, guys.

On Sun, Nov 6, 2016 at 4:57 PM, raghav  wrote:

> I am newbie in the world of big data analytics, and I want to teach myself
> Apache Spark, and want to be able to write scripts to tinker with data.
>
> I have some understanding of Map Reduce but have not had a chance to get my
> hands dirty. There are tons of resources for Spark, but I am looking for
> some guidance for starter material, or videos.
>
> Thanks.
>
> Raghav
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Newbie-question-Best-way-to-
> bootstrap-with-Spark-tp28032.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Raghav


Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Denny Lee
The one you're looking for is the Data Sciences and Engineering with Apache
Spark at
https://www.edx.org/xseries/data-science-engineering-apacher-sparktm.

Note, a great quick start is the Getting Started with Apache Spark on
Databricks at https://databricks.com/product/getting-started-guide

HTH!

On Sun, Nov 6, 2016 at 22:20 Raghav  wrote:

> Can you please point out the right courses from EDX/Berkeley ?
>
> Many thanks.
>
> On Sun, Nov 6, 2016 at 6:08 PM, ayan guha  wrote:
>
> I would start with Spark documentation, really. Then you would probably
> start with some older videos from youtube, especially spark summit
> 2014,2015 and 2016 videos. Regading practice, I would strongly suggest
> Databricks cloud (or download prebuilt from spark site). You can also take
> courses from EDX/Berkley, which are very good starter courses.
>
> On Mon, Nov 7, 2016 at 11:57 AM, raghav  wrote:
>
> I am newbie in the world of big data analytics, and I want to teach myself
> Apache Spark, and want to be able to write scripts to tinker with data.
>
> I have some understanding of Map Reduce but have not had a chance to get my
> hands dirty. There are tons of resources for Spark, but I am looking for
> some guidance for starter material, or videos.
>
> Thanks.
>
> Raghav
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>
>


Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Raghav
Can you please point out the right courses from EDX/Berkeley ?

Many thanks.

On Sun, Nov 6, 2016 at 6:08 PM, ayan guha  wrote:

> I would start with Spark documentation, really. Then you would probably
> start with some older videos from youtube, especially spark summit
> 2014,2015 and 2016 videos. Regading practice, I would strongly suggest
> Databricks cloud (or download prebuilt from spark site). You can also take
> courses from EDX/Berkley, which are very good starter courses.
>
> On Mon, Nov 7, 2016 at 11:57 AM, raghav  wrote:
>
>> I am newbie in the world of big data analytics, and I want to teach myself
>> Apache Spark, and want to be able to write scripts to tinker with data.
>>
>> I have some understanding of Map Reduce but have not had a chance to get
>> my
>> hands dirty. There are tons of resources for Spark, but I am looking for
>> some guidance for starter material, or videos.
>>
>> Thanks.
>>
>> Raghav
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-
>> with-Spark-tp28032.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>


Re: Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread warmb...@qq.com
EDX/Berkley +1



___
黄鹏程 HuangPengCheng
中国民生银行 总行科技开发部DBA组&应用运维四中心
*规范操作,主动维护,及时处理*
温良恭俭让**
地址:北京市顺义区顺安南路中国民生银行总部基地
邮编:101300
电话:010-56361701
手机:13488788499
Email:huangpengch...@cmbc.com.cn ,gnu...@gmail.com
 
From: ayan guha
Date: 2016-11-07 10:08
To: raghav
CC: user
Subject: Re: Newbie question - Best way to bootstrap with Spark
I would start with Spark documentation, really. Then you would probably start 
with some older videos from youtube, especially spark summit 2014,2015 and 2016 
videos. Regading practice, I would strongly suggest Databricks cloud (or 
download prebuilt from spark site). You can also take courses from EDX/Berkley, 
which are very good starter courses. 

On Mon, Nov 7, 2016 at 11:57 AM, raghav <raghavas...@gmail.com> wrote:
I am newbie in the world of big data analytics, and I want to teach myself
Apache Spark, and want to be able to write scripts to tinker with data.

I have some understanding of Map Reduce but have not had a chance to get my
hands dirty. There are tons of resources for Spark, but I am looking for
some guidance for starter material, or videos.

Thanks.

Raghav



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org




-- 
Best Regards,
Ayan Guha


Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread ayan guha
I would start with Spark documentation, really. Then you would probably
start with some older videos from youtube, especially spark summit
2014,2015 and 2016 videos. Regading practice, I would strongly suggest
Databricks cloud (or download prebuilt from spark site). You can also take
courses from EDX/Berkley, which are very good starter courses.

On Mon, Nov 7, 2016 at 11:57 AM, raghav  wrote:

> I am newbie in the world of big data analytics, and I want to teach myself
> Apache Spark, and want to be able to write scripts to tinker with data.
>
> I have some understanding of Map Reduce but have not had a chance to get my
> hands dirty. There are tons of resources for Spark, but I am looking for
> some guidance for starter material, or videos.
>
> Thanks.
>
> Raghav
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Newbie-question-Best-way-to-
> bootstrap-with-Spark-tp28032.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Best Regards,
Ayan Guha


Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Tristan Nixon
Right, well I don’t think the issue is with how you’re compiling the scala. I 
think it’s a conflict between different versions of several libs.
I had similar issues with my spark modules. You need to make sure you’re not 
loading a different version of the same lib that is clobbering another 
dependency. It’s very frustrating, but with patience you can weed them out. 
You’ll want to find the offending libs and put them into an  block 
under the associated dependency. I am still working with spark 1.5, scala 2.10, 
and for me the presence of scalap was the problem, and this resolved it:

 org.apache.spark
 spark-core_2.10
 1.5.1
 
  
   org.json4s
   json4s-core_2.10
  
 


 org.json4s
 json4s-core_2.10
 3.2.10
 
  
   org.scala-lang
   scalap
  
 


Unfortunately scalap is a dependency of json4s, which I want to keep. So what I 
do is exclude json4s from spark-core, then add it back in, but with its 
troublesome scalap dependency removed.


> On Mar 11, 2016, at 6:34 PM, Vasu Parameswaran  wrote:
> 
> Added these to the pom and still the same error :-(. I will look into sbt as 
> well.
> 
> 
> 
> On Fri, Mar 11, 2016 at 2:31 PM, Tristan Nixon  > wrote:
> You must be relying on IntelliJ to compile your scala, because you haven’t 
> set up any scala plugin to compile it from maven.
> You should have something like this in your plugins:
> 
> 
>  
>   net.alchim31.maven
>   scala-maven-plugin
>   
>
> scala-compile-first
> process-resources
> 
>  compile
> 
>
>
> scala-test-compile
> process-test-resources
> 
>  testCompile
> 
>
>   
>  
> 
> 
> PS - I use maven to compile all my scala and haven’t had a problem with it. I 
> know that sbt has some wonderful things, but I’m just set in my ways ;)
> 
>> On Mar 11, 2016, at 2:02 PM, Jacek Laskowski > > wrote:
>> 
>> Hi,
>> 
>> Doh! My eyes are bleeding to go through XMLs... 
>> 
>> Where did you specify Scala version? Dunno how it's in maven.
>> 
>> p.s. I *strongly* recommend sbt.
>> 
>> Jacek
>> 
>> 11.03.2016 8:04 PM "Vasu Parameswaran" > > napisał(a):
>> Thanks Jacek.  Pom is below (Currenlty set to 1.6.1 spark but I started out 
>> with 1.6.0 with the same problem).
>> 
>> 
>> 
>> http://maven.apache.org/POM/4.0.0 
>> "
>>  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance 
>> "
>>  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
>>  
>> http://maven.apache.org/xsd/maven-4.0.0.xsd 
>> ">
>> 
>> spark
>> com.test
>> 1.0-SNAPSHOT
>> 
>> 4.0.0
>> 
>> sparktest
>> 
>> 
>> UTF-8
>> 
>> 
>> 
>> 
>> junit
>> junit
>> 
>> 
>> 
>> commons-cli
>> commons-cli
>> 
>> 
>> com.google.code.gson
>> gson
>> 2.3.1
>> compile
>> 
>> 
>> org.apache.spark
>> spark-core_2.11
>> 1.6.1
>> 
>> 
>> 
>> 
>> 
>> 
>> org.apache.maven.plugins
>> maven-shade-plugin
>> 2.4.2
>> 
>> 
>> package
>> 
>> shade
>> 
>> 
>> 
>> 
>> 
>> ${project.artifactId}-${project.version}-with-dependencies
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Fri, Mar 11, 2016 at 10:46 AM, Jacek Laskowski > > wrote:
>> Hi,
>> 
>> Why do you use maven not sbt for Scala?
>> 
>> Can you show the entire pom.xml and the command to execute the app?
>> 
>> Jacek
>> 
>> 11.03.2016 7:33 PM "vasu20" > 
>> napisał(a):
>> Hi
>> 
>> Any help appreciated on this.  I am trying to write a Spark program using
>> IntelliJ.  I get a run time error as soon as new SparkConf() is called from
>> main.  Top few lines of the exception are pasted below.
>> 
>> These are the following versions:
>> 
>> Spark jar:  spark-assembly-1.6.0-hadoop2.6.0.jar
>> pom:  spark-core_2.11
>>  1.6.0
>> 
>> I have installed the Scala plugin in IntelliJ and added a dependency.
>> 
>> I have also added a library dependency in the project structure.
>> 
>> Thanks for any help!
>> 
>> Vasu
>> 
>> 
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String;
>> at org.apache.spark.util.Utils$.(Utils.scala:1682)
>> at 

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Vasu Parameswaran
Added these to the pom and still the same error :-(. I will look into sbt
as well.



On Fri, Mar 11, 2016 at 2:31 PM, Tristan Nixon 
wrote:

> You must be relying on IntelliJ to compile your scala, because you haven’t
> set up any scala plugin to compile it from maven.
> You should have something like this in your plugins:
>
> 
>  
>   net.alchim31.maven
>   scala-maven-plugin
>   
>
> scala-compile-first
> process-resources
> 
>  compile
> 
>
>
> scala-test-compile
> process-test-resources
> 
>  testCompile
> 
>
>   
>  
> 
>
>
> PS - I use maven to compile all my scala and haven’t had a problem with
> it. I know that sbt has some wonderful things, but I’m just set in my ways
> ;)
>
> On Mar 11, 2016, at 2:02 PM, Jacek Laskowski  wrote:
>
> Hi,
>
> Doh! My eyes are bleeding to go through XMLs... 
>
> Where did you specify Scala version? Dunno how it's in maven.
>
> p.s. I *strongly* recommend sbt.
>
> Jacek
> 11.03.2016 8:04 PM "Vasu Parameswaran"  napisał(a):
>
>> Thanks Jacek.  Pom is below (Currenlty set to 1.6.1 spark but I started
>> out with 1.6.0 with the same problem).
>>
>>
>> 
>> http://maven.apache.org/POM/4.0.0;
>>  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>>  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
>> http://maven.apache.org/xsd/maven-4.0.0.xsd;>
>> 
>> spark
>> com.test
>> 1.0-SNAPSHOT
>> 
>> 4.0.0
>>
>> sparktest
>>
>> 
>> UTF-8
>> 
>>
>> 
>> 
>> junit
>> junit
>> 
>>
>> 
>> commons-cli
>> commons-cli
>> 
>> 
>> com.google.code.gson
>> gson
>> 2.3.1
>> compile
>> 
>> 
>> org.apache.spark
>> spark-core_2.11
>> 1.6.1
>> 
>> 
>>
>> 
>> 
>> 
>> org.apache.maven.plugins
>> maven-shade-plugin
>> 2.4.2
>> 
>> 
>> package
>> 
>> shade
>> 
>> 
>> 
>> 
>>
>> ${project.artifactId}-${project.version}-with-dependencies
>> 
>> 
>> 
>> 
>>
>> 
>>
>>
>>
>> On Fri, Mar 11, 2016 at 10:46 AM, Jacek Laskowski 
>> wrote:
>>
>>> Hi,
>>>
>>> Why do you use maven not sbt for Scala?
>>>
>>> Can you show the entire pom.xml and the command to execute the app?
>>>
>>> Jacek
>>> 11.03.2016 7:33 PM "vasu20"  napisał(a):
>>>
 Hi

 Any help appreciated on this.  I am trying to write a Spark program
 using
 IntelliJ.  I get a run time error as soon as new SparkConf() is called
 from
 main.  Top few lines of the exception are pasted below.

 These are the following versions:

 Spark jar:  spark-assembly-1.6.0-hadoop2.6.0.jar
 pom:  spark-core_2.11
  1.6.0

 I have installed the Scala plugin in IntelliJ and added a dependency.

 I have also added a library dependency in the project structure.

 Thanks for any help!

 Vasu


 Exception in thread "main" java.lang.NoSuchMethodError:
 scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String;
 at org.apache.spark.util.Utils$.(Utils.scala:1682)
 at org.apache.spark.util.Utils$.(Utils.scala)
 at org.apache.spark.SparkConf.(SparkConf.scala:59)






 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Help-with-runtime-error-on-augmentString-tp26462.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com
 .

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


>>
>


Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Tristan Nixon
You must be relying on IntelliJ to compile your scala, because you haven’t set 
up any scala plugin to compile it from maven.
You should have something like this in your plugins:


 
  net.alchim31.maven
  scala-maven-plugin
  
   
scala-compile-first
process-resources

 compile

   
   
scala-test-compile
process-test-resources

 testCompile

   
  
 


PS - I use maven to compile all my scala and haven’t had a problem with it. I 
know that sbt has some wonderful things, but I’m just set in my ways ;)

> On Mar 11, 2016, at 2:02 PM, Jacek Laskowski  wrote:
> 
> Hi,
> 
> Doh! My eyes are bleeding to go through XMLs... 
> 
> Where did you specify Scala version? Dunno how it's in maven.
> 
> p.s. I *strongly* recommend sbt.
> 
> Jacek
> 
> 11.03.2016 8:04 PM "Vasu Parameswaran"  > napisał(a):
> Thanks Jacek.  Pom is below (Currenlty set to 1.6.1 spark but I started out 
> with 1.6.0 with the same problem).
> 
> 
> 
> http://maven.apache.org/POM/4.0.0 
> "
>  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance 
> "
>  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
>  
> http://maven.apache.org/xsd/maven-4.0.0.xsd 
> ">
> 
> spark
> com.test
> 1.0-SNAPSHOT
> 
> 4.0.0
> 
> sparktest
> 
> 
> UTF-8
> 
> 
> 
> 
> junit
> junit
> 
> 
> 
> commons-cli
> commons-cli
> 
> 
> com.google.code.gson
> gson
> 2.3.1
> compile
> 
> 
> org.apache.spark
> spark-core_2.11
> 1.6.1
> 
> 
> 
> 
> 
> 
> org.apache.maven.plugins
> maven-shade-plugin
> 2.4.2
> 
> 
> package
> 
> shade
> 
> 
> 
> 
> 
> ${project.artifactId}-${project.version}-with-dependencies
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Fri, Mar 11, 2016 at 10:46 AM, Jacek Laskowski  > wrote:
> Hi,
> 
> Why do you use maven not sbt for Scala?
> 
> Can you show the entire pom.xml and the command to execute the app?
> 
> Jacek
> 
> 11.03.2016 7:33 PM "vasu20" > 
> napisał(a):
> Hi
> 
> Any help appreciated on this.  I am trying to write a Spark program using
> IntelliJ.  I get a run time error as soon as new SparkConf() is called from
> main.  Top few lines of the exception are pasted below.
> 
> These are the following versions:
> 
> Spark jar:  spark-assembly-1.6.0-hadoop2.6.0.jar
> pom:  spark-core_2.11
>  1.6.0
> 
> I have installed the Scala plugin in IntelliJ and added a dependency.
> 
> I have also added a library dependency in the project structure.
> 
> Thanks for any help!
> 
> Vasu
> 
> 
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String;
> at org.apache.spark.util.Utils$.(Utils.scala:1682)
> at org.apache.spark.util.Utils$.(Utils.scala)
> at org.apache.spark.SparkConf.(SparkConf.scala:59)
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Help-with-runtime-error-on-augmentString-tp26462.html
>  
> 
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> 
> For additional commands, e-mail: user-h...@spark.apache.org 
> 
> 
> 



Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Jacek Laskowski
Hi,

Doh! My eyes are bleeding to go through XMLs... 

Where did you specify Scala version? Dunno how it's in maven.

p.s. I *strongly* recommend sbt.

Jacek
11.03.2016 8:04 PM "Vasu Parameswaran"  napisał(a):

> Thanks Jacek.  Pom is below (Currenlty set to 1.6.1 spark but I started
> out with 1.6.0 with the same problem).
>
>
> 
> http://maven.apache.org/POM/4.0.0;
>  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
> http://maven.apache.org/xsd/maven-4.0.0.xsd;>
> 
> spark
> com.test
> 1.0-SNAPSHOT
> 
> 4.0.0
>
> sparktest
>
> 
> UTF-8
> 
>
> 
> 
> junit
> junit
> 
>
> 
> commons-cli
> commons-cli
> 
> 
> com.google.code.gson
> gson
> 2.3.1
> compile
> 
> 
> org.apache.spark
> spark-core_2.11
> 1.6.1
> 
> 
>
> 
> 
> 
> org.apache.maven.plugins
> maven-shade-plugin
> 2.4.2
> 
> 
> package
> 
> shade
> 
> 
> 
> 
>
> ${project.artifactId}-${project.version}-with-dependencies
> 
> 
> 
> 
>
> 
>
>
>
> On Fri, Mar 11, 2016 at 10:46 AM, Jacek Laskowski  wrote:
>
>> Hi,
>>
>> Why do you use maven not sbt for Scala?
>>
>> Can you show the entire pom.xml and the command to execute the app?
>>
>> Jacek
>> 11.03.2016 7:33 PM "vasu20"  napisał(a):
>>
>>> Hi
>>>
>>> Any help appreciated on this.  I am trying to write a Spark program using
>>> IntelliJ.  I get a run time error as soon as new SparkConf() is called
>>> from
>>> main.  Top few lines of the exception are pasted below.
>>>
>>> These are the following versions:
>>>
>>> Spark jar:  spark-assembly-1.6.0-hadoop2.6.0.jar
>>> pom:  spark-core_2.11
>>>  1.6.0
>>>
>>> I have installed the Scala plugin in IntelliJ and added a dependency.
>>>
>>> I have also added a library dependency in the project structure.
>>>
>>> Thanks for any help!
>>>
>>> Vasu
>>>
>>>
>>> Exception in thread "main" java.lang.NoSuchMethodError:
>>> scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String;
>>> at org.apache.spark.util.Utils$.(Utils.scala:1682)
>>> at org.apache.spark.util.Utils$.(Utils.scala)
>>> at org.apache.spark.SparkConf.(SparkConf.scala:59)
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Help-with-runtime-error-on-augmentString-tp26462.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>


Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Vasu Parameswaran
Thanks Jacek.  Pom is below (Currenlty set to 1.6.1 spark but I started out
with 1.6.0 with the same problem).



http://maven.apache.org/POM/4.0.0;
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd;>

spark
com.test
1.0-SNAPSHOT

4.0.0

sparktest


UTF-8




junit
junit



commons-cli
commons-cli


com.google.code.gson
gson
2.3.1
compile


org.apache.spark
spark-core_2.11
1.6.1






org.apache.maven.plugins
maven-shade-plugin
2.4.2


package

shade





${project.artifactId}-${project.version}-with-dependencies









On Fri, Mar 11, 2016 at 10:46 AM, Jacek Laskowski  wrote:

> Hi,
>
> Why do you use maven not sbt for Scala?
>
> Can you show the entire pom.xml and the command to execute the app?
>
> Jacek
> 11.03.2016 7:33 PM "vasu20"  napisał(a):
>
>> Hi
>>
>> Any help appreciated on this.  I am trying to write a Spark program using
>> IntelliJ.  I get a run time error as soon as new SparkConf() is called
>> from
>> main.  Top few lines of the exception are pasted below.
>>
>> These are the following versions:
>>
>> Spark jar:  spark-assembly-1.6.0-hadoop2.6.0.jar
>> pom:  spark-core_2.11
>>  1.6.0
>>
>> I have installed the Scala plugin in IntelliJ and added a dependency.
>>
>> I have also added a library dependency in the project structure.
>>
>> Thanks for any help!
>>
>> Vasu
>>
>>
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String;
>> at org.apache.spark.util.Utils$.(Utils.scala:1682)
>> at org.apache.spark.util.Utils$.(Utils.scala)
>> at org.apache.spark.SparkConf.(SparkConf.scala:59)
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Help-with-runtime-error-on-augmentString-tp26462.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>


Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Vasu Parameswaran
Thanks Ted.

I haven't explicitly specified Scala (I tried different versions in pom.xml
as well).

For what it is worth, this is what I get when I do a maven dependency
tree.  I wonder if the 2.11.2 coming from scala-reflect matters:


[INFO] |  | \- org.scala-lang:scalap:jar:2.11.0:compile
[INFO] |  |\- org.scala-lang:scala-compiler:jar:2.11.0:compile
[INFO] |  |   +-
org.scala-lang.modules:scala-xml_2.11:jar:1.0.1:compile
[INFO] |  |   \-
org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.1:compile
[INFO] |  +-
com.fasterxml.jackson.module:jackson-module-scala_2.11:jar:2.4.4:compile
[INFO] |  |  +- org.scala-lang:scala-reflect:jar:2.11.2:compile
[INFO] \- org.scala-lang:scala-library:jar:2.11.0:compile



On Fri, Mar 11, 2016 at 10:38 AM, Ted Yu  wrote:

> Looks like Scala version mismatch.
>
> Are you using 2.11 everywhere ?
>
> On Fri, Mar 11, 2016 at 10:33 AM, vasu20  wrote:
>
>> Hi
>>
>> Any help appreciated on this.  I am trying to write a Spark program using
>> IntelliJ.  I get a run time error as soon as new SparkConf() is called
>> from
>> main.  Top few lines of the exception are pasted below.
>>
>> These are the following versions:
>>
>> Spark jar:  spark-assembly-1.6.0-hadoop2.6.0.jar
>> pom:  spark-core_2.11
>>  1.6.0
>>
>> I have installed the Scala plugin in IntelliJ and added a dependency.
>>
>> I have also added a library dependency in the project structure.
>>
>> Thanks for any help!
>>
>> Vasu
>>
>>
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String;
>> at org.apache.spark.util.Utils$.(Utils.scala:1682)
>> at org.apache.spark.util.Utils$.(Utils.scala)
>> at org.apache.spark.SparkConf.(SparkConf.scala:59)
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Help-with-runtime-error-on-augmentString-tp26462.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Jacek Laskowski
Hi,

Why do you use maven not sbt for Scala?

Can you show the entire pom.xml and the command to execute the app?

Jacek
11.03.2016 7:33 PM "vasu20"  napisał(a):

> Hi
>
> Any help appreciated on this.  I am trying to write a Spark program using
> IntelliJ.  I get a run time error as soon as new SparkConf() is called from
> main.  Top few lines of the exception are pasted below.
>
> These are the following versions:
>
> Spark jar:  spark-assembly-1.6.0-hadoop2.6.0.jar
> pom:  spark-core_2.11
>  1.6.0
>
> I have installed the Scala plugin in IntelliJ and added a dependency.
>
> I have also added a library dependency in the project structure.
>
> Thanks for any help!
>
> Vasu
>
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String;
> at org.apache.spark.util.Utils$.(Utils.scala:1682)
> at org.apache.spark.util.Utils$.(Utils.scala)
> at org.apache.spark.SparkConf.(SparkConf.scala:59)
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Help-with-runtime-error-on-augmentString-tp26462.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Ted Yu
Looks like Scala version mismatch.

Are you using 2.11 everywhere ?

On Fri, Mar 11, 2016 at 10:33 AM, vasu20  wrote:

> Hi
>
> Any help appreciated on this.  I am trying to write a Spark program using
> IntelliJ.  I get a run time error as soon as new SparkConf() is called from
> main.  Top few lines of the exception are pasted below.
>
> These are the following versions:
>
> Spark jar:  spark-assembly-1.6.0-hadoop2.6.0.jar
> pom:  spark-core_2.11
>  1.6.0
>
> I have installed the Scala plugin in IntelliJ and added a dependency.
>
> I have also added a library dependency in the project structure.
>
> Thanks for any help!
>
> Vasu
>
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String;
> at org.apache.spark.util.Utils$.(Utils.scala:1682)
> at org.apache.spark.util.Utils$.(Utils.scala)
> at org.apache.spark.SparkConf.(SparkConf.scala:59)
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Help-with-runtime-error-on-augmentString-tp26462.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Newbie question

2016-01-07 Thread Deepak Sharma
Yes , you can do it unless the method is marked static/final.
Most of the methods in SparkContext are marked static so you can't over
ride them definitely , else over ride would work usually.

Thanks
Deepak

On Fri, Jan 8, 2016 at 12:06 PM, yuliya Feldman  wrote:

> Hello,
>
> I am new to Spark and have a most likely basic question - can I override a
> method from SparkContext?
>
> Thanks
>



-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net


Re: Newbie question

2016-01-07 Thread censj
You can try it.  
> 在 2016年1月8日,14:44,yuliya Feldman  写道:
> 
> invoked



Re: Newbie question

2016-01-07 Thread yuliya Feldman
For example to add some functionality there.
I understand I can have extended SparkContext as an implicit class to add new 
methods that can be invoked on SparkContext, but I want to see if I can 
override existing one.

 

  From: censj <ce...@lotuseed.com>
 To: yuliya Feldman <yufeld...@yahoo.com> 
Cc: "user@spark.apache.org" <user@spark.apache.org>
 Sent: Thursday, January 7, 2016 10:38 PM
 Subject: Re: Newbie question
   
why to override a method from SparkContext?
在 2016年1月8日,14:36,yuliya Feldman <yufeld...@yahoo.com.INVALID> 写道:
Hello,
I am new to Spark and have a most likely basic question - can I override a 
method from SparkContext?
Thanks



  

Re: Newbie question

2016-01-07 Thread yuliya Feldman
Thank you
 

  From: Deepak Sharma <deepakmc...@gmail.com>
 To: yuliya Feldman <yufeld...@yahoo.com> 
Cc: "user@spark.apache.org" <user@spark.apache.org>
 Sent: Thursday, January 7, 2016 10:41 PM
 Subject: Re: Newbie question
   
Yes , you can do it unless the method is marked static/final.Most of the 
methods in SparkContext are marked static so you can't over ride them 
definitely , else over ride would work usually.
ThanksDeepak
On Fri, Jan 8, 2016 at 12:06 PM, yuliya Feldman <yufeld...@yahoo.com.invalid> 
wrote:

Hello,
I am new to Spark and have a most likely basic question - can I override a 
method from SparkContext?
Thanks



-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

   

Re: Newbie question

2016-01-07 Thread dEEPU
If the method is not final or static then u can
On Jan 8, 2016 12:07 PM, yuliya Feldman  wrote:
Hello,
I am new to Spark and have a most likely basic question - can I override a 
method from SparkContext?
Thanks


Re: Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Hien Luu
This blog outlines a few things that make Spark faster than MapReduce -
https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html

On Fri, Aug 7, 2015 at 9:13 AM, Muler mulugeta.abe...@gmail.com wrote:

 Consider the classic word count application over a 4 node cluster with a
 sizable working data. What makes Spark ran faster than MapReduce
 considering that Spark also has to write to disk during shuffle?



Re: Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Corey Nolet
1) Spark only needs to shuffle when data needs to be partitioned around the
workers in an all-to-all fashion.
2) Multi-stage jobs that would normally require several map reduce jobs,
thus causing data to be dumped to disk between the jobs can be cached in
memory.


Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Muler
Thanks!

On Wed, Aug 5, 2015 at 5:24 PM, Saisai Shao sai.sai.s...@gmail.com wrote:

 Yes, finally shuffle data will be written to disk for reduce stage to
 pull, no matter how large you set to shuffle memory fraction.

 Thanks
 Saisai

 On Thu, Aug 6, 2015 at 7:50 AM, Muler mulugeta.abe...@gmail.com wrote:

 thanks, so if I have enough large memory (with enough
 spark.shuffle.memory) then shuffle (in-memory shuffle) spill doesn't happen
 (per node) but still shuffle data has to be ultimately written to disk so
 that reduce stage pulls if across network?

 On Wed, Aug 5, 2015 at 4:40 PM, Saisai Shao sai.sai.s...@gmail.com
 wrote:

 Hi Muler,

 Shuffle data will be written to disk, no matter how large memory you
 have, large memory could alleviate shuffle spill where temporary file will
 be generated if memory is not enough.

 Yes, each node writes shuffle data to file and pulled from disk in
 reduce stage from network framework (default is Netty).

 Thanks
 Saisai

 On Thu, Aug 6, 2015 at 7:10 AM, Muler mulugeta.abe...@gmail.com wrote:

 Hi,

 Consider I'm running WordCount with 100m of data on 4 node cluster.
 Assuming my RAM size on each node is 200g and i'm giving my executors 100g
 (just enough memory for 100m data)


1. If I have enough memory, can Spark 100% avoid writing to disk?
2. During shuffle, where results have to be collected from nodes,
does each node write to disk and then the results are pulled from disk? 
 If
not, what is the API that is being used to pull data from nodes across 
 the
cluster? (I'm thinking what Scala or Java packages would allow you to 
 read
in-memory data from other machines?)

 Thanks,







Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Saisai Shao
Hi Muler,

Shuffle data will be written to disk, no matter how large memory you have,
large memory could alleviate shuffle spill where temporary file will be
generated if memory is not enough.

Yes, each node writes shuffle data to file and pulled from disk in reduce
stage from network framework (default is Netty).

Thanks
Saisai

On Thu, Aug 6, 2015 at 7:10 AM, Muler mulugeta.abe...@gmail.com wrote:

 Hi,

 Consider I'm running WordCount with 100m of data on 4 node cluster.
 Assuming my RAM size on each node is 200g and i'm giving my executors 100g
 (just enough memory for 100m data)


1. If I have enough memory, can Spark 100% avoid writing to disk?
2. During shuffle, where results have to be collected from nodes, does
each node write to disk and then the results are pulled from disk? If not,
what is the API that is being used to pull data from nodes across the
cluster? (I'm thinking what Scala or Java packages would allow you to read
in-memory data from other machines?)

 Thanks,



Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Muler
thanks, so if I have enough large memory (with enough spark.shuffle.memory)
then shuffle (in-memory shuffle) spill doesn't happen (per node) but still
shuffle data has to be ultimately written to disk so that reduce stage
pulls if across network?

On Wed, Aug 5, 2015 at 4:40 PM, Saisai Shao sai.sai.s...@gmail.com wrote:

 Hi Muler,

 Shuffle data will be written to disk, no matter how large memory you have,
 large memory could alleviate shuffle spill where temporary file will be
 generated if memory is not enough.

 Yes, each node writes shuffle data to file and pulled from disk in reduce
 stage from network framework (default is Netty).

 Thanks
 Saisai

 On Thu, Aug 6, 2015 at 7:10 AM, Muler mulugeta.abe...@gmail.com wrote:

 Hi,

 Consider I'm running WordCount with 100m of data on 4 node cluster.
 Assuming my RAM size on each node is 200g and i'm giving my executors 100g
 (just enough memory for 100m data)


1. If I have enough memory, can Spark 100% avoid writing to disk?
2. During shuffle, where results have to be collected from nodes,
does each node write to disk and then the results are pulled from disk? If
not, what is the API that is being used to pull data from nodes across the
cluster? (I'm thinking what Scala or Java packages would allow you to read
in-memory data from other machines?)

 Thanks,





Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Saisai Shao
Yes, finally shuffle data will be written to disk for reduce stage to pull,
no matter how large you set to shuffle memory fraction.

Thanks
Saisai

On Thu, Aug 6, 2015 at 7:50 AM, Muler mulugeta.abe...@gmail.com wrote:

 thanks, so if I have enough large memory (with enough
 spark.shuffle.memory) then shuffle (in-memory shuffle) spill doesn't happen
 (per node) but still shuffle data has to be ultimately written to disk so
 that reduce stage pulls if across network?

 On Wed, Aug 5, 2015 at 4:40 PM, Saisai Shao sai.sai.s...@gmail.com
 wrote:

 Hi Muler,

 Shuffle data will be written to disk, no matter how large memory you
 have, large memory could alleviate shuffle spill where temporary file will
 be generated if memory is not enough.

 Yes, each node writes shuffle data to file and pulled from disk in reduce
 stage from network framework (default is Netty).

 Thanks
 Saisai

 On Thu, Aug 6, 2015 at 7:10 AM, Muler mulugeta.abe...@gmail.com wrote:

 Hi,

 Consider I'm running WordCount with 100m of data on 4 node cluster.
 Assuming my RAM size on each node is 200g and i'm giving my executors 100g
 (just enough memory for 100m data)


1. If I have enough memory, can Spark 100% avoid writing to disk?
2. During shuffle, where results have to be collected from nodes,
does each node write to disk and then the results are pulled from disk? 
 If
not, what is the API that is being used to pull data from nodes across 
 the
cluster? (I'm thinking what Scala or Java packages would allow you to 
 read
in-memory data from other machines?)

 Thanks,






Re: Newbie Question on How Tasks are Executed

2015-01-19 Thread davidkl
Hello Mixtou, if you want to look at partition ID, I believe you want to use
mapPartitionsWithIndex





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-Question-on-How-Tasks-are-Executed-tp21064p21228.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: newbie question quickstart example sbt issue

2014-10-28 Thread Yanbo Liang
Maybe you had wrong configuration of sbt proxy.

2014-10-28 18:27 GMT+08:00 nl19856 hanspeter.sl...@gmail.com:

 Hi,
 I have downloaded the binary spark distribution.
 When building the package with sbt package I get the following:
 [root@nlvora157 ~]# sbt package
 [info] Set current project to Simple Project (in build file:/root/)
 [info] Updating {file:/root/}root...
 [info] Resolving org.apache.spark#spark-core_2.10;1.1.0 ...
 [warn] Host repo1.maven.org not found.
 url=
 https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.pom
 [info] You probably access the destination server through a proxy server
 that is not well configured.
 [warn]  module not found: org.apache.spark#spark-core_2.10;1.1.0
 [warn]  local: tried
 [warn]
 /root/.ivy2/local/org.apache.spark/spark-core_2.10/1.1.0/ivys/ivy.xml
 [warn]  public: tried
 [warn]

 https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.pom
 [info] Resolving org.fusesource.jansi#jansi;1.4 ...
 [warn]  ::
 [warn]  ::  UNRESOLVED DEPENDENCIES ::
 [warn]  ::
 [warn]  :: org.apache.spark#spark-core_2.10;1.1.0: not found
 [warn]  ::
 [warn]
 [warn]  Note: Unresolved dependencies path:
 [warn]  org.apache.spark:spark-core_2.10:1.1.0
 (/root/simple.sbt#L7-8)
 [warn]+- simple-project:simple-project_2.10:1.0
 sbt.ResolveException: unresolved dependency:
 org.apache.spark#spark-core_2.10;1.1.0: not found

 What am I doing wrong?

 Regards Hans-Peter




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/newbie-question-quickstart-example-sbt-issue-tp17477.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: newbie question quickstart example sbt issue

2014-10-28 Thread nl19856
Sigh!
Sorry I did not read the error message properly.


2014-10-28 11:39 GMT+01:00 Yanbo Liang [via Apache Spark User List] 
ml-node+s1001560n17478...@n3.nabble.com:

 Maybe you had wrong configuration of sbt proxy.

 2014-10-28 18:27 GMT+08:00 nl19856 [hidden email]
 http://user/SendEmail.jtp?type=nodenode=17478i=0:

 Hi,
 I have downloaded the binary spark distribution.
 When building the package with sbt package I get the following:
 [root@nlvora157 ~]# sbt package
 [info] Set current project to Simple Project (in build file:/root/)
 [info] Updating {file:/root/}root...
 [info] Resolving org.apache.spark#spark-core_2.10;1.1.0 ...
 [warn] Host repo1.maven.org not found.
 url=
 https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.pom
 [info] You probably access the destination server through a proxy server
 that is not well configured.
 [warn]  module not found: org.apache.spark#spark-core_2.10;1.1.0
 [warn]  local: tried
 [warn]
 /root/.ivy2/local/org.apache.spark/spark-core_2.10/1.1.0/ivys/ivy.xml
 [warn]  public: tried
 [warn]

 https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.pom
 [info] Resolving org.fusesource.jansi#jansi;1.4 ...
 [warn]  ::
 [warn]  ::  UNRESOLVED DEPENDENCIES ::
 [warn]  ::
 [warn]  :: org.apache.spark#spark-core_2.10;1.1.0: not found
 [warn]  ::
 [warn]
 [warn]  Note: Unresolved dependencies path:
 [warn]  org.apache.spark:spark-core_2.10:1.1.0
 (/root/simple.sbt#L7-8)
 [warn]+- simple-project:simple-project_2.10:1.0
 sbt.ResolveException: unresolved dependency:
 org.apache.spark#spark-core_2.10;1.1.0: not found

 What am I doing wrong?

 Regards Hans-Peter




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/newbie-question-quickstart-example-sbt-issue-tp17477.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=17478i=1
 For additional commands, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=17478i=2




 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/newbie-question-quickstart-example-sbt-issue-tp17477p17478.html
  To unsubscribe from newbie question quickstart example sbt issue, click
 here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=17477code=SGFuc1BldGVyLlNsb290QGdtYWlsLmNvbXwxNzQ3N3w2NDk2NTI3MjA=
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/newbie-question-quickstart-example-sbt-issue-tp17477p17479.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: newbie question quickstart example sbt issue

2014-10-28 Thread Akhil Das
Your proxy/dns could be blocking it.

Thanks
Best Regards

On Tue, Oct 28, 2014 at 4:06 PM, Yanbo Liang yanboha...@gmail.com wrote:

 Maybe you had wrong configuration of sbt proxy.

 2014-10-28 18:27 GMT+08:00 nl19856 hanspeter.sl...@gmail.com:

 Hi,
 I have downloaded the binary spark distribution.
 When building the package with sbt package I get the following:
 [root@nlvora157 ~]# sbt package
 [info] Set current project to Simple Project (in build file:/root/)
 [info] Updating {file:/root/}root...
 [info] Resolving org.apache.spark#spark-core_2.10;1.1.0 ...
 [warn] Host repo1.maven.org not found.
 url=
 https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.pom
 [info] You probably access the destination server through a proxy server
 that is not well configured.
 [warn]  module not found: org.apache.spark#spark-core_2.10;1.1.0
 [warn]  local: tried
 [warn]
 /root/.ivy2/local/org.apache.spark/spark-core_2.10/1.1.0/ivys/ivy.xml
 [warn]  public: tried
 [warn]

 https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.pom
 [info] Resolving org.fusesource.jansi#jansi;1.4 ...
 [warn]  ::
 [warn]  ::  UNRESOLVED DEPENDENCIES ::
 [warn]  ::
 [warn]  :: org.apache.spark#spark-core_2.10;1.1.0: not found
 [warn]  ::
 [warn]
 [warn]  Note: Unresolved dependencies path:
 [warn]  org.apache.spark:spark-core_2.10:1.1.0
 (/root/simple.sbt#L7-8)
 [warn]+- simple-project:simple-project_2.10:1.0
 sbt.ResolveException: unresolved dependency:
 org.apache.spark#spark-core_2.10;1.1.0: not found

 What am I doing wrong?

 Regards Hans-Peter




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/newbie-question-quickstart-example-sbt-issue-tp17477.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org