Hi all,
I'm trying to guess understand what is the lifecycle of a map function in
spark/yarn context. My understanding is that function is instantiated on the
master and then passed to each executor (serialized/deserialized).
What I'd like to confirm is that the function is
initialized/loaded
Hi all,
I'm trying to create a dataframe enforcing a schema so that I can write it
to a parquet file. The schema has timestamps and I get an error with
pyspark. The following is a snippet of code that exhibits the problem,
df = sqlctx.range(1000)
schema = StructType([StructField('a',
As the error says clearly, column FL Date has a different format that you
are expecting. Modify you date format mask appropriately
On Wed, 11 Apr 2018 at 5:12 pm, @Nandan@ <nandanpriyadarshi...@gmail.com>
wrote:
> Hi ,
> I am not able to use .map function in Spark.
>
> My
Hi ,
I am not able to use .map function in Spark.
My codes are as below :-
*1) Create Parse function:-*
from datetime import datetime
from collections import namedtuple
fields =
('date','airline','flightnum','origin','dest','dep','dep_delay','arv','arv_delay','airtime','distance')
Flight
Hi,
I have a spark job which needs to access HBase inside a mapToPair function. The
question is that I do not want to connect to HBase and close connection
each time.
As I understand, PairFunction is not designed to manage resources with
setup() and close(), like Hadoop reader and writer.
Does
return docPlusRows
> })
>
> JavaEsSparkStreaming.saveToEs(parsedMetadataAndRows,
> "index/type" // ...
>
>
> But it appears I cannot get access to the sqlContext when I run this on
> the spark cluster because that code is executing in
ng.saveToEs(parsedMetadataAndRows,
"index/type" // ...
But it appears I cannot get access to the sqlContext when I run this on the
spark cluster because that code is executing in the executor not in the
driver.
Is there a way I can access or create a SqlContext to be able to pull the
file down from S3
If I understand your question you should look at withColumn of dataframe
api.
df.withColumn("len", len("l"))
Thanks
Ankur
On Fri, Apr 14, 2017 at 6:07 AM, issues solution <issues.solut...@gmail.com>
wrote:
> Hi ,
> how you can create column inside map funct
Hi ,
how you can create column inside map function
like that :
df.map(lambd l : len(l) ) .
but instead return rdd we create column insde data frame .
Hello Marcelo,
Finally what was the solution, I face the same problem.
Thank you
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Reference-External-Variables-in-Map-Function-Inner-class-tp11990p28237.html
Sent from the Apache Spark User List mailing list
Please post a minimal complete code example of what you are talking about
On Thu, Dec 15, 2016 at 6:00 PM, Michael Nguyen
<nguyen.m.mich...@gmail.com> wrote:
> I have the following sequence of Spark Java API calls (Spark 2.0.2):
>
> Kafka stream that is processed via a map function
I have the following sequence of Spark Java API calls (Spark 2.0.2):
1. Kafka stream that is processed via a map function, which returns the
string value from tuple2._2() for JavaDStream as in
return tuple2._2();
1.
The returned JavaDStream is then processed by foreachPartition
ould be aware of when using
> this for production applications. Also is there a "non-experimental" way of
> using map function on Dataframe in Spark 2.0
>
> Thanks,
> Ninad
>
t;non-experimental" way of
using map function on Dataframe in Spark 2.0
Thanks,
Ninad
;
> wrote:
>
>> Thats a bug. Which version of Spark are you running? Have you tried
>> 2.0.2?
>>
>> On Wed, Nov 2, 2016 at 12:01 AM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote:
>>
>>> Hi, all.
>>> When I use a case class as return value in m
发才(Yan Facai) <yaf...@gmail.com> wrote:
>
>> Hi, all.
>> When I use a case class as return value in map function, spark always
>> raise a ClassCastException.
>>
>> I write an demo, like:
>>
>> scala> case class Record(key: Int, value: String)
Thats a bug. Which version of Spark are you running? Have you tried 2.0.2?
On Wed, Nov 2, 2016 at 12:01 AM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote:
> Hi, all.
> When I use a case class as return value in map function, spark always
> raise a ClassCastException.
>
>
Hi, all.
When I use a case class as return value in map function, spark always raise
a ClassCastException.
I write an demo, like:
scala> case class Record(key: Int, value: String)
scala> case class ID(key: Int)
scala> val df = Seq(Record(1, "a"), Record(2, "b")
Yes .You can do something like this :
.map(x=>mapfunction(x))
Thanks
Deepak
On 9 Jul 2016 9:22 am, "charles li" <charles.up...@gmail.com> wrote:
>
> hi, guys, is there a way to dynamic load files within the map function.
>
> i.e.
>
> Can I cod
hi, guys, is there a way to dynamic load files within the map function.
i.e.
Can I code as bellow:
thanks a lot.
--
*___*
Quant | Engineer | Boy
*___*
*blog*:http://litaotao.github.io
*github*: www.github.com/litaotao
* from user) as account);
DataFrame accountRdd =
sqlContext.read().format(jdbc).options(options).load();
and i have another RDD which contains login name and i want to find the
userid from above DF RDD and return it
Not sure how can i do that as when i apply a map function and say
();
and i have another RDD which contains login name and i want to find the
userid from above DF RDD and return it
Not sure how can i do that as when i apply a map function and say filter
on DF i get Null pointor exception.
Please help.
contains login name and i want to find the
userid from above DF RDD and return it
Not sure how can i do that as when i apply a map function and say filter on
DF i get Null pointor exception.
Please help.
it
Not sure how can i do that as when i apply a map function and say filter on
DF i get Null pointor exception.
Please help.
In fact you can return “NULL” from your initial map and hence not resort to
OptionalString at all
From: Evo Eftimov [mailto:evo.efti...@isecc.com]
Sent: Sunday, April 19, 2015 9:48 PM
To: 'Steve Lewis'
Cc: 'Olivier Girardot'; 'user@spark.apache.org'
Subject: RE: Can a map function return
throw Spark exception THEN
as far as I am concerned, chess-mate
From: Steve Lewis [mailto:lordjoe2...@gmail.com]
Sent: Sunday, April 19, 2015 8:16 PM
To: Evo Eftimov
Cc: Olivier Girardot; user@spark.apache.org
Subject: Re: Can a map function return null
So you imagine something like
null
Sent from Samsung Mobile
Original message
From: Olivier Girardot
Date:2015/04/18 22:04 (GMT+00:00)
To: Steve Lewis ,user@spark.apache.org
Subject: Re: Can a map function return null
You can return an RDD with null values inside, and afterwards filter on
item
))
ret.add(transform(s));
return ret; // contains 0 items if isUsed is false
}
});
My question is can I do a map returning the transformed data and null if
nothing is to be returned. as shown below - what does a Spark do with a map
function returning
is can I do a map returning the transformed data and null if
nothing is to be returned. as shown below - what does a Spark do with a map
function returning null
JavaRDDFoo words = original.map(new MapFunctionString, String() {
@Override
Foo call(final Foo s) throws
)
at
org.apache.spark.streaming.api.java.JavaDStreamLike$class.mapToPair(JavaDStreamLike.scala:146)
at
org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46)
at streamReader.App.main(App.java:66)
Is using the sparkContext from inside a map function wrong
)
at
org.apache.spark.streaming.api.java.JavaDStreamLike$class.mapToPair(JavaDStreamLike.scala:146)
at
org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46)
at streamReader.App.main(App.java:66)
Is using the sparkContext from inside a map function
)
at
org.apache.spark.streaming.api.java.JavaDStreamLike$class.mapToPair(JavaDStreamLike.scala:146)
at
org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46)
at streamReader.App.main(App.java:66)
Is using the sparkContext from inside a map function wrong
)
at
org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46)
at streamReader.App.main(App.java:66)
Is using the sparkContext from inside a map function wrong ?
This is the code we are using:
SparkConf conf = new SparkConf().setAppName(Simple
Application
Hi all
Is there efficient way to trigger RDD transformations? I'm now using count
action to achieve this.
Best regards
Kevin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-function-without-action-tp21094.html
Sent from the Apache
to trigger RDD transformations? I'm now using
count action to achieve this.
Best regards
Kevin
--
If you reply to this email, your message will be added to the
discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-function
-map-function-without-action-tp21094.html
To unsubscribe from Apache Spark User List, click here
http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=a2V2aW5raW1AYXBhY2hlLm9yZ3wxfC0xNDUyMjU3MDUw
.
NAML
http://apache-spark-user-list
.nabble.com/Manually-trigger-RDD-map-function-without-action-tp21094p21110.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands
. And a winner is foreach(a = a) according to everyone's
expectations. Collect can cause OOM from driver and count is very slower
than the others. Thanks all.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-function-without-action
Hi,
I have a RDD like below:
(1, (10, 20))
(2, (30, 40, 10))
(3, (30))
…
Is there any way to map it to this:
(10,1)
(20,1)
(30,2)
(40,2)
(10,2)
(30,3)
…
generally, for each element, it might be mapped to multiple.
Thanks in advance!
Best,
Yifan LI
@spark.apache.org
Oggetto: map function
Hi,
I have a RDD like below:
(1, (10, 20))
(2, (30, 40, 10))
(3, (30))
…
Is there any way to map it to this:
(10,1)
(20,1)
(30,2)
(40,2)
(10,2)
(30,3)
…
generally, for each element, it might be mapped to multiple.
Thanks in advance!
Best,
Yifan LI
...@gmail.com
Inviato: 04/12/2014 09:27
A: user@spark.apache.org mailto:user@spark.apache.org
Oggetto: map function
Hi,
I have a RDD like below:
(1, (10, 20))
(2, (30, 40, 10))
(3, (30))
…
Is there any way to map it to this:
(10,1)
(20,1)
(30,2)
(40,2)
(10,2)
(30,3
doing cleanup in an iterator like that assumes the iterator always gets
fully read, which is not necessary the case (for example RDD.take does not).
instead i would use mapPartitionsWithContext, in which case you can write a
function of the form.
f: (TaskContext, Iterator[T]) = Iterator[U]
now
, there are two options which may be helpful
to your problem, they are --total-executor-cores NUM(standalone and
mesos only), --executor-cores(yarn only)
qinwei
From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use multi
thread in RDD map function ?Hi, everyone
I come across
.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp15286.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail
.
Is there any one can provide some example code to use Java multi thread
in RDD transformation, or give any idea to increase the concurrency ?
Thanks for all
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp15286
in the options of spark-submit, there are two options which may be helpful to
your problem, they are --total-executor-cores NUM(standalone and mesos only),
--executor-cores(yarn only)
qinwei
From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use multi thread in
RDD map function
:45 PM, Shekhar Bansal wrote:
Hi
In one of our usecase, filename contains timestamp and we have to
append it in the record for aggregation.
How can I access filename in map function?
Thanks!
I have a class defining an inner static class (map function). The inner
class tries to refer the variable instantiated in the outside class, which
results in a NullPointerException. Sample Code as follows:
class SampleOuterClass {
private static ArrayListString someVariable
will be looking at its local
SampleOuterClass, which is maybe not initialized on the remote JVM.
On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri sunny.k...@gmail.com
wrote:
I have a class defining an inner static class (map function). The inner
class tries to refer the variable instantiated
not initialized on the remote JVM.
On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri sunny.k...@gmail.com
wrote:
I have a class defining an inner static class (map function). The inner
class tries to refer the variable instantiated in the outside class,
which
results in a NullPointerException. Sample
are going to be serialized in the
closure? the instance of Parse will be looking at its local
SampleOuterClass, which is maybe not initialized on the remote JVM.
On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri sunny.k...@gmail.com
wrote:
I have a class defining an inner static class (map
in
Hadoop. (At least there was no direct way.) Same here. But you can
maintain state across many map calls.
On Wed, Jul 30, 2014 at 6:07 PM, Kevin kevin.macksa...@gmail.com wrote:
Hi,
Is it possible to maintain state inside a Spark map function? With Hadoop
MapReduce, Mappers
Hello every one
I am having some problem with a simple Scala/ Spark Code in which I am
trying to replaces certain fields in a csv with their hashes
class DSV (var line:String=,fieldsList:Seq[Int], var
delimiter:String=,) extends Serializable {
def hash(s:String):String={
Does this line println(Retuning +string) from the hash function
print what you expect? If you're not seeing that output in the
executor log I'd also put some debug statements in case other, since
your match in the interesting case is conditioned on if(
fieldsList.contains(index)) -- maybe that
Hi spark-folk,
I have a directory full of files that I want to process using PySpark.
There is some necessary metadata in the filename that I would love to
attach to each record in that file. Using Java MapReduce, I would access
(FileSplit) context.getInputSplit()).getPath().getName()
in the
55 matches
Mail list logo