Re: help check my simple job

2022-02-06 Thread capitnfrakass
That did resolve my issue. Thanks a lot. frakass n 06/02/2022 17:25, Hannes Bibel wrote: Hi, looks like you're packaging your application for Scala 2.13 (should be specified in your build.sbt) while your Spark installation is built for Scala 2.12. Go to

help check my simple job

2022-02-06 Thread capitnfrakass
Hello I wrote this simple job in scala: $ cat Myjob.scala import org.apache.spark.sql.SparkSession object Myjob { def main(args: Array[String]): Unit = { val sparkSession = SparkSession.builder.appName("Simple Application").getOrCreate() val sparkContext =

Re: help check my simple job

2022-02-06 Thread Hannes Bibel
Hi, looks like you're packaging your application for Scala 2.13 (should be specified in your build.sbt) while your Spark installation is built for Scala 2.12. Go to https://spark.apache.org/downloads.html, select under "Choose a package type" the package type that says "Scala 2.13". With that

Re: Python performance

2022-02-06 Thread Hinko Kocevar
Thanks for your input guys! //hinko On 4 Feb 2022, at 14:58, Sean Owen wrote:  Yes, in the sense that any transformation that can be expressed in the SQL-like DataFrame API will push down to the JVM, and take advantage of other optimizations, avoiding the data movement to/from Python and

dataframe doesn't support higher order func, right?

2022-02-06 Thread capitnfrakass
for example, this work for RDD object: scala> val li = List(3,2,1,4,0) li: List[Int] = List(3, 2, 1, 4, 0) scala> val rdd = sc.parallelize(li) rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at :24 scala> rdd.filter(_ > 2).collect() res0: Array[Int] = Array(3, 4)

Re: dataframe doesn't support higher order func, right?

2022-02-06 Thread Sean Owen
DataFrames are a quite different API, more SQL-like in its operations, not functional. The equivalent would be more like df.filterExpr("value > 2") On Sun, Feb 6, 2022 at 5:51 AM wrote: > for example, this work for RDD object: > > scala> val li = List(3,2,1,4,0) > li: List[Int] = List(3, 2, 1,

Re: dataframe doesn't support higher order func, right?

2022-02-06 Thread Mich Talebzadeh
Basically you are creating a dataframe (a dataframe is a *Dataset* organized into named columns. It is conceptually equivalent to a table in a relational database) out of RDD here. scala> val rdd = sc.parallelize( List(3, 2, 1, 4, 0)) rdd: org.apache.spark.rdd.RDD[Int] =

Unsubscribe

2022-02-06 Thread Yogitha Ramanathan
Unsubscribe

Re: add an auto_increment column

2022-02-06 Thread ayan guha
Try this: https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.monotonically_increasing_id.html On Mon, 7 Feb 2022 at 12:27 pm, wrote: > For a dataframe object, how to add a column who is auto_increment like > mysql's behavior? > > Thank you. > >

add an auto_increment column

2022-02-06 Thread capitnfrakass
For a dataframe object, how to add a column who is auto_increment like mysql's behavior? Thank you. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Fwd: (send this email to subscribe)

2022-02-06 Thread Madhuchaitanya Joshi
-- Forwarded message - From: Madhuchaitanya Joshi Date: Wed, 19 Jan, 2022, 10:51 Subject: (send this email to subscribe) To: Hello team, I am trying to build and compile spark source code using intellij and eclipse. But I am getting jackson-bind.jar not found error in

Re: add an auto_increment column

2022-02-06 Thread Siva Samraj
Monotonically_increasing_id() will give the same functionality On Mon, 7 Feb, 2022, 6:57 am , wrote: > For a dataframe object, how to add a column who is auto_increment like > mysql's behavior? > > Thank you. > > - > To

Re: dataframe doesn't support higher order func, right?

2022-02-06 Thread capitnfrakass
I am a bit confused why in pyspark this doesn't work? x = sc.parallelize([3,2,1,4]) x.toDF.show() Traceback (most recent call last): File "", line 1, in AttributeError: 'function' object has no attribute 'show' Thank you.

TypeError: Can not infer schema for type:

2022-02-06 Thread capitnfrakass
rdd = sc.parallelize([3,2,1,4]) rdd.toDF().show() Traceback (most recent call last): File "", line 1, in File "/opt/spark/python/pyspark/sql/session.py", line 66, in toDF return sparkSession.createDataFrame(self, schema, sampleRatio) File "/opt/spark/python/pyspark/sql/session.py",

Re: TypeError: Can not infer schema for type:

2022-02-06 Thread Sean Owen
You are passing a list of primitives. It expects something like a list of tuples, which can each have 1 int if you like. On Sun, Feb 6, 2022, 10:10 PM wrote: > >>> rdd = sc.parallelize([3,2,1,4]) > >>> rdd.toDF().show() > Traceback (most recent call last): >File "", line 1, in >File

Re: dataframe doesn't support higher order func, right?

2022-02-06 Thread Sean Owen
This is just basic Python - you're missing parentheses on toDF, so you are not calling a function nor getting its result. On Sun, Feb 6, 2022 at 9:39 PM wrote: > I am a bit confused why in pyspark this doesn't work? > > >>> x = sc.parallelize([3,2,1,4]) > >>> x.toDF.show() > Traceback (most

Re: TypeError: Can not infer schema for type:

2022-02-06 Thread capitnfrakass
Thanks for the reply. It looks strange that in scala shell I can implement this translation: scala> sc.parallelize(List(3,2,1,4)).toDF.show +-+ |value| +-+ |3| |2| |1| |4| +-+ But in pyspark i have to write as: sc.parallelize([3,2,1,4]).map(lambda x:

Re: dataframe doesn't support higher order func, right?

2022-02-06 Thread capitnfrakass
Indeed. in spark-shell I ignore the parentheses always, scala> sc.parallelize(List(3,2,1,4)).toDF.show +-+ |value| +-+ |3| |2| |1| |4| +-+ So I think it would be ok in pyspark. But this still doesn't work. why? sc.parallelize([3,2,1,4]).toDF().show() Traceback

Re: dataframe doesn't support higher order func, right?

2022-02-06 Thread Sean Owen
Scala and Python are not the same in this regard. This isn't related to how spark works. On Sun, Feb 6, 2022, 10:04 PM wrote: > Indeed. in spark-shell I ignore the parentheses always, > > scala> sc.parallelize(List(3,2,1,4)).toDF.show > +-+ > |value| > +-+ > |3| > |2| > |1|