Lifecycle of a map function

2020-04-07 Thread Vadim Vararu
Hi all, I'm trying to guess understand what is the lifecycle of a map function in spark/yarn context. My understanding is that function is instantiated on the master and then passed to each executor (serialized/deserialized). What I'd like to confirm is that the function is initialized/loaded

Pyspark error when converting string to timestamp in map function

2018-08-17 Thread Keith Chapman
Hi all, I'm trying to create a dataframe enforcing a schema so that I can write it to a parquet file. The schema has timestamps and I get an error with pyspark. The following is a snippet of code that exhibits the problem, df = sqlctx.range(1000) schema = StructType([StructField('a',

Re: Issue with map function in Spark 2.2.0

2018-04-11 Thread ayan guha
As the error says clearly, column FL Date has a different format that you are expecting. Modify you date format mask appropriately On Wed, 11 Apr 2018 at 5:12 pm, @Nandan@ <nandanpriyadarshi...@gmail.com> wrote: > Hi , > I am not able to use .map function in Spark. > > My

Issue with map function in Spark 2.2.0

2018-04-11 Thread @Nandan@
Hi , I am not able to use .map function in Spark. My codes are as below :- *1) Create Parse function:-* from datetime import datetime from collections import namedtuple fields = ('date','airline','flightnum','origin','dest','dep','dep_delay','arv','arv_delay','airtime','distance') Flight

Resource manage inside map function

2018-03-30 Thread Huiliang Zhang
Hi, I have a spark job which needs to access HBase inside a mapToPair function. The question is that I do not want to connect to HBase and close connection each time. As I understand, PairFunction is not designed to manage resources with setup() and close(), like Hadoop reader and writer. Does

Re: Spark SQL within a DStream map function

2017-06-16 Thread Burak Yavuz
return docPlusRows > }) > > JavaEsSparkStreaming.saveToEs(parsedMetadataAndRows, > "index/type" // ... > > > But it appears I cannot get access to the sqlContext when I run this on > the spark cluster because that code is executing in

Spark SQL within a DStream map function

2017-06-16 Thread Mike Hugo
ng.saveToEs(parsedMetadataAndRows, "index/type" // ... But it appears I cannot get access to the sqlContext when I run this on the spark cluster because that code is executing in the executor not in the driver. Is there a way I can access or create a SqlContext to be able to pull the file down from S3

Re: create column with map function apply to dataframe

2017-04-14 Thread Ankur Srivastava
If I understand your question you should look at withColumn of dataframe api. df.withColumn("len", len("l")) Thanks Ankur On Fri, Apr 14, 2017 at 6:07 AM, issues solution <issues.solut...@gmail.com> wrote: > Hi , > how you can create column inside map funct

create column with map function apply to dataframe

2017-04-14 Thread issues solution
Hi , how you can create column inside map function like that : df.map(lambd l : len(l) ) . but instead return rdd we create column insde data frame .

Re: Reference External Variables in Map Function (Inner class)

2016-12-19 Thread mbayebabacar
Hello Marcelo, Finally what was the solution, I face the same problem. Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reference-External-Variables-in-Map-Function-Inner-class-tp11990p28237.html Sent from the Apache Spark User List mailing list

Re: Why foreachPartition function make duplicate invocation to map function for every message ? (Spark 2.0.2)

2016-12-16 Thread Cody Koeninger
Please post a minimal complete code example of what you are talking about On Thu, Dec 15, 2016 at 6:00 PM, Michael Nguyen <nguyen.m.mich...@gmail.com> wrote: > I have the following sequence of Spark Java API calls (Spark 2.0.2): > > Kafka stream that is processed via a map function

Why foreachPartition function make duplicate invocation to map function for every message ? (Spark 2.0.2)

2016-12-15 Thread Michael Nguyen
I have the following sequence of Spark Java API calls (Spark 2.0.2): 1. Kafka stream that is processed via a map function, which returns the string value from tuple2._2() for JavaDStream as in return tuple2._2(); 1. The returned JavaDStream is then processed by foreachPartition

Re: [DataFrames] map function - 2.0

2016-12-15 Thread Michael Armbrust
ould be aware of when using > this for production applications. Also is there a "non-experimental" way of > using map function on Dataframe in Spark 2.0 > > Thanks, > Ninad >

[DataFrames] map function - 2.0

2016-12-15 Thread Ninad Shringarpure
t;non-experimental" way of using map function on Dataframe in Spark 2.0 Thanks, Ninad

Re: How to return a case class in map function?

2016-11-03 Thread Yan Facai
; > wrote: > >> Thats a bug. Which version of Spark are you running? Have you tried >> 2.0.2? >> >> On Wed, Nov 2, 2016 at 12:01 AM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: >> >>> Hi, all. >>> When I use a case class as return value in m

Re: How to return a case class in map function?

2016-11-03 Thread Yan Facai
发才(Yan Facai) <yaf...@gmail.com> wrote: > >> Hi, all. >> When I use a case class as return value in map function, spark always >> raise a ClassCastException. >> >> I write an demo, like: >> >> scala> case class Record(key: Int, value: String)

Re: How to return a case class in map function?

2016-11-02 Thread Michael Armbrust
Thats a bug. Which version of Spark are you running? Have you tried 2.0.2? On Wed, Nov 2, 2016 at 12:01 AM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: > Hi, all. > When I use a case class as return value in map function, spark always > raise a ClassCastException. > >

How to return a case class in map function?

2016-11-02 Thread Yan Facai
Hi, all. When I use a case class as return value in map function, spark always raise a ClassCastException. I write an demo, like: scala> case class Record(key: Int, value: String) scala> case class ID(key: Int) scala> val df = Seq(Record(1, "a"), Record(2, "b")

Re: Is there a way to dynamic load files [ parquet or csv ] in the map function?

2016-07-08 Thread Deepak Sharma
Yes .You can do something like this : .map(x=>mapfunction(x)) Thanks Deepak On 9 Jul 2016 9:22 am, "charles li" <charles.up...@gmail.com> wrote: > > hi, guys, is there a way to dynamic load files within the map function. > > i.e. > > Can I cod

Is there a way to dynamic load files [ parquet or csv ] in the map function?

2016-07-08 Thread charles li
hi, guys, is there a way to dynamic load files within the map function. i.e. Can I code as bellow: ​ thanks a lot. ​ -- *___* ​ ​ Quant | Engineer | Boy *___* *blog*:http://litaotao.github.io *github*: www.github.com/litaotao

Re: DataFrame Find/Filter Based on Input - Inside Map function

2015-07-02 Thread ayan guha
* from user) as account); DataFrame accountRdd = sqlContext.read().format(jdbc).options(options).load(); and i have another RDD which contains login name and i want to find the userid from above DF RDD and return it Not sure how can i do that as when i apply a map function and say

Re: DataFrame Find/Filter Based on Input - Inside Map function

2015-07-01 Thread ayan guha
(); and i have another RDD which contains login name and i want to find the userid from above DF RDD and return it Not sure how can i do that as when i apply a map function and say filter on DF i get Null pointor exception. Please help.

Re: DataFrame Find/Filter Based on Input - Inside Map function

2015-07-01 Thread Mailing List
contains login name and i want to find the userid from above DF RDD and return it Not sure how can i do that as when i apply a map function and say filter on DF i get Null pointor exception. Please help.

DataFrame Find/Filter Based on Input - Inside Map function

2015-07-01 Thread Ashish Soni
it Not sure how can i do that as when i apply a map function and say filter on DF i get Null pointor exception. Please help.

RE: Can a map function return null

2015-04-19 Thread Evo Eftimov
In fact you can return “NULL” from your initial map and hence not resort to OptionalString at all From: Evo Eftimov [mailto:evo.efti...@isecc.com] Sent: Sunday, April 19, 2015 9:48 PM To: 'Steve Lewis' Cc: 'Olivier Girardot'; 'user@spark.apache.org' Subject: RE: Can a map function return

RE: Can a map function return null

2015-04-19 Thread Evo Eftimov
throw Spark exception THEN as far as I am concerned, chess-mate From: Steve Lewis [mailto:lordjoe2...@gmail.com] Sent: Sunday, April 19, 2015 8:16 PM To: Evo Eftimov Cc: Olivier Girardot; user@spark.apache.org Subject: Re: Can a map function return null So you imagine something like

Re: Can a map function return null

2015-04-19 Thread Steve Lewis
null Sent from Samsung Mobile Original message From: Olivier Girardot Date:2015/04/18 22:04 (GMT+00:00) To: Steve Lewis ,user@spark.apache.org Subject: Re: Can a map function return null You can return an RDD with null values inside, and afterwards filter on item

Can a map function return null

2015-04-18 Thread Steve Lewis
)) ret.add(transform(s)); return ret; // contains 0 items if isUsed is false } }); My question is can I do a map returning the transformed data and null if nothing is to be returned. as shown below - what does a Spark do with a map function returning

Re: Can a map function return null

2015-04-18 Thread Olivier Girardot
is can I do a map returning the transformed data and null if nothing is to be returned. as shown below - what does a Spark do with a map function returning null JavaRDDFoo words = original.map(new MapFunctionString, String() { @Override Foo call(final Foo s) throws

Using sparkContext in inside a map function

2015-03-08 Thread danielil
) at org.apache.spark.streaming.api.java.JavaDStreamLike$class.mapToPair(JavaDStreamLike.scala:146) at org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46) at streamReader.App.main(App.java:66) Is using the sparkContext from inside a map function wrong

Using sparkContext in inside a map function

2015-03-08 Thread danielil
) at org.apache.spark.streaming.api.java.JavaDStreamLike$class.mapToPair(JavaDStreamLike.scala:146) at org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46) at streamReader.App.main(App.java:66) Is using the sparkContext from inside a map function

using sparkContext from within a map function (from spark streaming app)

2015-03-08 Thread Daniel Haviv
) at org.apache.spark.streaming.api.java.JavaDStreamLike$class.mapToPair(JavaDStreamLike.scala:146) at org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46) at streamReader.App.main(App.java:66) Is using the sparkContext from inside a map function wrong

Re: using sparkContext from within a map function (from spark streaming app)

2015-03-08 Thread Sean Owen
) at org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46) at streamReader.App.main(App.java:66) Is using the sparkContext from inside a map function wrong ? This is the code we are using: SparkConf conf = new SparkConf().setAppName(Simple Application

Manually trigger RDD map function without action

2015-01-12 Thread Kevin Jung
Hi all Is there efficient way to trigger RDD transformations? I'm now using count action to achieve this. Best regards Kevin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-function-without-action-tp21094.html Sent from the Apache

Re: Manually trigger RDD map function without action

2015-01-12 Thread Cody Koeninger
to trigger RDD transformations? I'm now using count action to achieve this. Best regards Kevin -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-function

Re: Manually trigger RDD map function without action

2015-01-12 Thread kevinkim
-map-function-without-action-tp21094.html To unsubscribe from Apache Spark User List, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=a2V2aW5raW1AYXBhY2hlLm9yZ3wxfC0xNDUyMjU3MDUw . NAML http://apache-spark-user-list

Re: Manually trigger RDD map function without action

2015-01-12 Thread Sven Krasser
.nabble.com/Manually-trigger-RDD-map-function-without-action-tp21094p21110.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands

Re: Manually trigger RDD map function without action

2015-01-12 Thread Kevin Jung
. And a winner is foreach(a = a) according to everyone's expectations. Collect can cause OOM from driver and count is very slower than the others. Thanks all. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-function-without-action

map function

2014-12-04 Thread Yifan LI
Hi, I have a RDD like below: (1, (10, 20)) (2, (30, 40, 10)) (3, (30)) … Is there any way to map it to this: (10,1) (20,1) (30,2) (40,2) (10,2) (30,3) … generally, for each element, it might be mapped to multiple. Thanks in advance! Best, Yifan LI

R: map function

2014-12-04 Thread Paolo Platter
@spark.apache.org Oggetto: map function Hi, I have a RDD like below: (1, (10, 20)) (2, (30, 40, 10)) (3, (30)) … Is there any way to map it to this: (10,1) (20,1) (30,2) (40,2) (10,2) (30,3) … generally, for each element, it might be mapped to multiple. Thanks in advance! Best, Yifan LI

Re: map function

2014-12-04 Thread Yifan LI
...@gmail.com Inviato: ‎04/‎12/‎2014 09:27 A: user@spark.apache.org mailto:user@spark.apache.org Oggetto: map function Hi, I have a RDD like below: (1, (10, 20)) (2, (30, 40, 10)) (3, (30)) … Is there any way to map it to this: (10,1) (20,1) (30,2) (40,2) (10,2) (30,3

Re: Keep state inside map function

2014-10-28 Thread Koert Kuipers
doing cleanup in an iterator like that assumes the iterator always gets fully read, which is not necessary the case (for example RDD.take does not). instead i would use mapPartitionsWithContext, in which case you can write a function of the form. f: (TaskContext, Iterator[T]) = Iterator[U] now

Re: How to use multi thread in RDD map function ?

2014-09-28 Thread myasuka
, there are two options which may be helpful to your problem, they are --total-executor-cores NUM(standalone and mesos only), --executor-cores(yarn only) qinwei  From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use multi thread in RDD map function ?Hi, everyone     I come across

Re: How to use multi thread in RDD map function ?

2014-09-28 Thread Sean Owen
.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp15286.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail

How to use multi thread in RDD map function ?

2014-09-27 Thread myasuka
. Is there any one can provide some example code to use Java multi thread in RDD transformation, or give any idea to increase the concurrency ? Thanks for all -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp15286

Re: How to use multi thread in RDD map function ?

2014-09-27 Thread qinwei
in the options of spark-submit, there are two options which may be helpful to your problem, they are --total-executor-cores NUM(standalone and mesos only), --executor-cores(yarn only) qinwei  From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use multi thread in RDD map function

Re: Access file name in map function

2014-09-26 Thread Cheng Lian
:45 PM, Shekhar Bansal wrote: Hi In one of our usecase, filename contains timestamp and we have to append it in the record for aggregation. How can I access filename in map function? Thanks! ​

Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sunny Khatri
I have a class defining an inner static class (map function). The inner class tries to refer the variable instantiated in the outside class, which results in a NullPointerException. Sample Code as follows: class SampleOuterClass { private static ArrayListString someVariable

Re: Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sunny Khatri
will be looking at its local SampleOuterClass, which is maybe not initialized on the remote JVM. On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri sunny.k...@gmail.com wrote: I have a class defining an inner static class (map function). The inner class tries to refer the variable instantiated

Re: Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Marcelo Vanzin
not initialized on the remote JVM. On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri sunny.k...@gmail.com wrote: I have a class defining an inner static class (map function). The inner class tries to refer the variable instantiated in the outside class, which results in a NullPointerException. Sample

Re: Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sunny Khatri
are going to be serialized in the closure? the instance of Parse will be looking at its local SampleOuterClass, which is maybe not initialized on the remote JVM. On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri sunny.k...@gmail.com wrote: I have a class defining an inner static class (map

Re: Keep state inside map function

2014-07-30 Thread Kevin
in Hadoop. (At least there was no direct way.) Same here. But you can maintain state across many map calls. On Wed, Jul 30, 2014 at 6:07 PM, Kevin kevin.macksa...@gmail.com wrote: Hi, Is it possible to maintain state inside a Spark map function? With Hadoop MapReduce, Mappers

Map Function does not seem to be executing over RDD

2014-07-09 Thread Raza Rehman
Hello every one I am having some problem with a simple Scala/ Spark Code in which I am trying to replaces certain fields in a csv with their hashes class DSV (var line:String=,fieldsList:Seq[Int], var delimiter:String=,) extends Serializable { def hash(s:String):String={

Re: Map Function does not seem to be executing over RDD

2014-07-09 Thread Yana Kadiyska
Does this line println(Retuning +string) from the hash function print what you expect? If you're not seeing that output in the executor log I'd also put some debug statements in case other, since your match in the interesting case is conditioned on if( fieldsList.contains(index)) -- maybe that

Access original filename in a map function

2014-03-18 Thread Uri Laserson
Hi spark-folk, I have a directory full of files that I want to process using PySpark. There is some necessary metadata in the filename that I would love to attach to each record in that file. Using Java MapReduce, I would access (FileSplit) context.getInputSplit()).getPath().getName() in the