subject:"Data Locality"

Streaming partition-by data locality for state lookupon executor

2022-04-13 Thread Sandip Khanzode

Kinesis? What I would want to finally achieve is that the flatMapGroupWithState() that I would call later in the pipeline should have the same (partition) key internally for key lookups in the (RocksDB) state so that data locality can be achieved. Is this redundant or implicit or not possible

Re: [Spark Core][Advanced]: Problem with data locality when running Spark query with local nature on apache Hadoop

2021-04-13 Thread Russell Spitzer

ich IP or hostname of data-nodes returns > from name-node to the spark? or Can you offer me a debug approach? > >> On Farvardin 24, 1400 AP, at 17:45, Russell Spitzer >> mailto:russell.spit...@gmail.com>> wrote: >> >> Data locality can only occur if the Spar

Re: [Spark Core][Advanced]: Problem with data locality when running Spark query with local nature on apache Hadoop

2021-04-13 Thread Russell Spitzer

Data locality can only occur if the Spark Executor IP address string matches the preferred location returned by the file system. So this job would only have local tasks if the datanode replicas for the files in question had the same ip address as the Spark executors you are using. If they don't

[Spark Core][Advanced]: Problem with data locality when running Spark query with local nature on apache Hadoop

2021-04-13 Thread Mohamadreza Rostami

m/questions/66612906/problem-with-data-locality-when-running-spark-query-with-local-nature-on-apache <https://stackoverflow.com/questions/66612906/problem-with-data-locality-when-running-spark-query-with-local-nature-on-apache>

[ spark-streaming ] - Data Locality issue

2020-02-04 Thread Karthik Srinivas

Hi, I am using spark 2.3.2, i am facing issues due to data locality, even after giving spark.locality.wait.rack=200, locality_level is always RACK_LOCAL, can someone help me with this. Thank you

Data locality

2020-02-04 Thread Karthik Srinivas

Hi all, I am using spark 2.3.2, i am facing issues due to data locality, even after giving spark.locality.wait.rack=200, locality_level is always RACK_LOCAL, can someone help me with this. Thank you

Re: [Spark 2.0.2 HDFS]: no data locality

2016-12-28 Thread Miguel Morales

gt;>> >>> thanks for answering! >>> >>> >>>> Although the Spark task scheduler is aware of rack-level data locality, it >>>> seems that only YARN implements the support for it. >>> This explains why the script that I configured in

Re: [Spark 2.0.2 HDFS]: no data locality

2016-12-28 Thread Karamba

> >>> Although the Spark task scheduler is aware of rack-level data locality, it >>> seems that only YARN implements the support for it. >> This explains why the script that I configured in core-site.xml >> topology.script.file.name is not called in by the sp

Re: [Spark 2.0.2 HDFS]: no data locality

2016-12-28 Thread Miguel Morales

g! > > >> Although the Spark task scheduler is aware of rack-level data locality, it >> seems that only YARN implements the support for it. > > This explains why the script that I configured in core-site.xml > topology.script.file.name is not called in by the spark

Re: [Spark 2.0.2 HDFS]: no data locality

2016-12-28 Thread Karamba

Hi Sun Rui, thanks for answering! > Although the Spark task scheduler is aware of rack-level data locality, it > seems that only YARN implements the support for it. This explains why the script that I configured in core-site.xml topology.script.file.name is not called in by the

Re: [Spark 2.0.2 HDFS]: no data locality

2016-12-27 Thread Sun Rui

Although the Spark task scheduler is aware of rack-level data locality, it seems that only YARN implements the support for it. However, node-level locality can still work for Standalone. It is not necessary to copy the hadoop config files into the Spark CONF directory. Set HADOOP_CONF_DIR

[Spark 2.0.2 HDFS]: no data locality

2016-12-26 Thread Karamba

Hi, I am running a couple of docker hosts, each with an HDFS and a spark worker in a spark standalone cluster. In order to get data locality awareness, I would like to configure Racks for each host, so that a spark worker container knows from which hdfs node container it should load its data

Re: Does Spark uses data locality information from HDFS when running in standalone mode?

2016-06-05 Thread Eugene Morozov

Does Spark uses data locality information from HDFS, when running in > standalone mode? Or is it running on YARN mandatory for such purpose? I > can't find this information in the docs, and on Google I am only finding > contrasting opinion on that. > > Regards > Marco Capuccini >

Re: Does Spark uses data locality information from HDFS when running in standalone mode?

2016-06-05 Thread Mich Talebzadeh

ill know about the datanodes from >> %HADOOP_HOME/etc/Hadoop/slaves >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/prof

Re: Does Spark uses data locality information from HDFS when running in standalone mode?

2016-06-05 Thread Mich Talebzadeh

profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > http://talebzadehmich.wordpress.com > > > > On 5 June 2016 at 10:50, Marco Capuccini <marco.capucc...@farmbio.uu.se> > wrote: > &

Re: Does Spark uses data locality information from HDFS when running in standalone mode?

2016-06-05 Thread Marco Capuccini

V8Pw http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/> On 5 June 2016 at 10:50, Marco Capuccini <marco.capucc...@farmbio.uu.se<mailto:marco.capucc...@farmbio.uu.se>> wrote: Dear all, Does Spark uses data locality information from HDFS, when running in stan

Re: Does Spark uses data locality information from HDFS when running in standalone mode?

2016-06-05 Thread Mich Talebzadeh

uccini <marco.capucc...@farmbio.uu.se> wrote: > Dear all, > > Does Spark uses data locality information from HDFS, when running in > standalone mode? Or is it running on YARN mandatory for such purpose? I > can't find this information in the docs, and on Google I am only fi

Does Spark uses data locality information from HDFS when running in standalone mode?

2016-06-05 Thread Marco Capuccini

Dear all, Does Spark uses data locality information from HDFS, when running in standalone mode? Or is it running on YARN mandatory for such purpose? I can't find this information in the docs, and on Google I am only finding contrasting opinion on that. Regards Marco Capuccini

Re: Apache Spark data locality when integrating with Kafka

2016-02-07 Thread Yuval.Itzchakov

, benefit from low IO latency and high throughput. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-data-locality-when-integrating-with-Kafka-tp26165p26170.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Apache Spark data locality when integrating with Kafka

2016-02-07 Thread Diwakar Dhanuskodi

ect: Re: Apache Spark data locality when integrating with Kafka I would definitely try to avoid hosting Kafka and Spark on the same servers. Kafka and Spark will be doing alot of IO between them, so you'll want to maximize on those resources and not share them on the same server. You'll want e

Re: Apache Spark data locality when integrating with Kafka

2016-02-07 Thread أنس الليثي

r . > > > > Sent from Samsung Mobile. > > > Original message > From: "Yuval.Itzchakov" <yuva...@gmail.com> > Date:07/02/2016 19:38 (GMT+05:30) > To: user@spark.apache.org > Cc: > Subject: Re: Apache Spark data locality when integrati

Re: Apache Spark data locality when integrating with Kafka

2016-02-07 Thread Diwakar Dhanuskodi

We are using spark in two ways 1. Yarn with spark support. Kafka running along with data nodes 2. Spark master and workers running with some of Kafka brokers. Data locality is important. Regards Diwakar Sent from Samsung Mobile. Original message From: أنس

Apache Spark data locality when integrating with Kafka

2016-02-06 Thread fanooos

/Apache-Spark-data-locality-when-integrating-with-Kafka-tp26165.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail

RE: Apache Spark data locality when integrating with Kafka

2016-02-06 Thread Diwakar Dhanuskodi

Yes . To reduce network latency . Sent from Samsung Mobile. Original message From: fanooos <dev.fano...@gmail.com> Date:07/02/2016 09:24 (GMT+05:30) To: user@spark.apache.org Cc: Subject: Apache Spark data locality when integrating with Kafka Dears If I wi

Re: Apache Spark data locality when integrating with Kafka

2016-02-06 Thread Koert Kuipers

spark can benefit from data locality and will try to launch tasks on the node where the kafka partition resides. however i think in production many organizations run a dedicated kafka cluster. On Sat, Feb 6, 2016 at 11:27 PM, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote:

Re: How data locality is honored when spark is running on yarn

2016-01-27 Thread Saisai Shao

, this is same for different cluster manager. Thanks Saisai On Thu, Jan 28, 2016 at 10:50 AM, Todd <bit1...@163.com> wrote: > Hi, > I am kind of confused about how data locality is honored when spark is > running on yarn(client or cluster mode),can someone please elaberate on > this? Thanks! > > >

How data locality is honored when spark is running on yarn

2016-01-27 Thread Todd

Hi, I am kind of confused about how data locality is honored when spark is running on yarn(client or cluster mode),can someone please elaberate on this? Thanks!

Data Locality Issue

2015-11-15 Thread Renu Yadav

Hi, I am working on spark 1.4 and reading a orc table using dataframe and converting that DF to RDD I spark UI I observe that 50 % task are running on locality and ANY and very few on LOCAL. What would be the possible reason for this? Please help. I have even changed locality settings Thanks

Re: Data Locality Issue

2015-11-15 Thread Renu Yadav

what are the parameters on which locality depends On Sun, Nov 15, 2015 at 5:54 PM, Renu Yadav wrote: > Hi, > > I am working on spark 1.4 and reading a orc table using dataframe and > converting that DF to RDD > > I spark UI I observe that 50 % task are running on locality and

Re: How does Spark coordinate with Tachyon wrt data locality

2015-10-23 Thread Calvin Jia

Hi Shane, Tachyon provides an api to get the block locations of the file which Spark uses when scheduling tasks. Hope this helps, Calvin On Fri, Oct 23, 2015 at 8:15 AM, Kinsella, Shane <shane.kinse...@aspect.com> wrote: > Hi all, > > > > I am looking into how Spark hand

How does Spark coordinate with Tachyon wrt data locality

2015-10-23 Thread Kinsella, Shane

Hi all, I am looking into how Spark handles data locality wrt Tachyon. My main concern is how this is coordinated. Will it send a task based on a file loaded from Tachyon to a node that it knows has that file locally and how does it know which nodes has what? Kind regards, Shane This email

Spark Streaming and Kafka MultiNode Setup - Data Locality

2015-09-21 Thread Ashish Soni

Hi All , Just wanted to find out if there is an benefits to installing kafka brokers and spark nodes on the same machine ? is it possible that spark can pull data from kafka if it is local to the node i.e. the broker or partition is on the same machine. Thanks, Ashish

Re: Spark Streaming and Kafka MultiNode Setup - Data Locality

2015-09-21 Thread Adrian Tanase

seconds. -adrian From: Cody Koeninger <c...@koeninger.org> Sent: Monday, September 21, 2015 10:19 PM To: Ashish Soni Cc: user Subject: Re: Spark Streaming and Kafka MultiNode Setup - Data Locality The direct stream already uses the kafka leader for a

Re: Spark Streaming and Kafka MultiNode Setup - Data Locality

2015-09-21 Thread Cody Koeninger

The direct stream already uses the kafka leader for a given partition as the preferred location. I don't run kafka on the same nodes as spark, and I don't know anyone who does, so that situation isn't particularly well tested. On Mon, Sep 21, 2015 at 1:15 PM, Ashish Soni

Re: Data locality with HDFS not being seen

2015-08-21 Thread Sameer Farooqui

Hi Sunil, Have you seen this fix in Spark 1.5 that may fix the locality issue?: https://issues.apache.org/jira/browse/SPARK-4352 On Thu, Aug 20, 2015 at 4:09 AM, Sunil sdhe...@gmail.com wrote: Hello . I am seeing some unexpected issues with achieving HDFS data locality. I expect

Data locality with HDFS not being seen

2015-08-20 Thread Sunil

Hello . I am seeing some unexpected issues with achieving HDFS data locality. I expect the tasks to be executed only on the node which has the data but this is not happening (ofcourse, unless the node is busy in which case, I understand tasks can go to some other node). Could anyone

Poor HDFS Data Locality on Spark-EC2

2015-08-04 Thread Jerry Lam

Hi Spark users and developers, I have been trying to use spark-ec2. After I launched the spark cluster (1.4.1) with ephemeral hdfs (using hadoop 2.4.0), I tried to execute a job where the data is stored in the ephemeral hdfs. It does not matter what I tried to do, there is no data locality at all

data locality in spark

2015-04-27 Thread Grandl Robert

Hi guys, I am running some SQL queries, but all my tasks are reported as either NODE_LOCAL or PROCESS_LOCAL. In case of Hadoop world, the reduce tasks are RACK or NON_RACK LOCAL because they have to aggregate data from multiple hosts. However, in Spark even the aggregation stages are reported

Re: Data locality across jobs

2015-04-02 Thread Sandy Ryza

. At the end of day, a daily job is launched, which works on the outputs of the hourly jobs. For data locality and speed, we wish that when the daily job launches, it finds all instances of a given key at a single executor rather than fetching it from others during shuffle. Is it possible

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-04-01 Thread Haoyuan Li

Response inline. On Tue, Mar 31, 2015 at 10:41 PM, Sean Bigdatafun sean.bigdata...@gmail.com wrote: (resending...) I was thinking the same setup… But the more I think of this problem, and the more interesting this could be. If we allocate 50% total memory to Tachyon statically, then the

Data locality across jobs

2015-04-01 Thread kjsingh

Hi, We are running an hourly job using Spark 1.2 on Yarn. It saves an RDD of Tuple2. At the end of day, a daily job is launched, which works on the outputs of the hourly jobs. For data locality and speed, we wish that when the daily job launches, it finds all instances of a given key at a single

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Sean Bigdatafun

(resending...) I was thinking the same setup… But the more I think of this problem, and the more interesting this could be. If we allocate 50% total memory to Tachyon statically, then the Mesos benefits of dynamically scheduling resources go away altogether. Can Tachyon be resource managed by

deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Ankur Chauhan

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I am fairly new to the spark ecosystem and I have been trying to setup a spark on mesos deployment. I can't seem to figure out the best practices around HDFS and Tachyon. The documentation about Spark's data-locality section seems to point

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Haoyuan Li

deployment. I can't seem to figure out the best practices around HDFS and Tachyon. The documentation about Spark's data-locality section seems to point that each of my mesos slave nodes should also run a hdfs datanode. This seems fine but I can't seem to figure out how I would go about pushing

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Ankur Chauhan

...@brightcove.com wrote: Hi, I am fairly new to the spark ecosystem and I have been trying to setup a spark on mesos deployment. I can't seem to figure out the best practices around HDFS and Tachyon. The documentation about Spark's data-locality section seems to point that each of my mesos

Re: How does Spark honor data locality when allocating computing resources for an application

2015-03-14 Thread eric wong

data locality: // Pack each app into as few nodes as possible until we've assigned all its cores for (worker - workers if worker.coresFree 0 worker.state == WorkerState.ALIVE) { for (app - waitingApps if app.coresLeft 0) { if (canUse(app, worker)) { val coresToUse

How does Spark honor data locality when allocating computing resources for an application

2015-03-13 Thread bit1...@163.com

Hi, sparkers, When I read the code about computing resources allocation for the newly submitted application in the Master#schedule method, I got a question about data locality: // Pack each app into as few nodes as possible until we've assigned all its cores for (worker - workers

Ensuring data locality when opening files

2015-03-09 Thread Daniel Haviv

Hi, We wrote a spark steaming app that receives file names on HDFS from Kafka and opens them using Hadoop's libraries. The problem with this method is that I'm not utilizing data locality because any worker might open any file without giving precedence to data locality. I can't open the files

Re: Data Locality

2015-01-28 Thread hnahak

-list.1001560.n3.nabble.com/Data-Locality-tp21000p21413.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user

Re: Data Locality

2015-01-28 Thread Harihar Nahak

? - --Harihar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-Locality-tp21000p21410.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user

Re: data locality in logs

2015-01-28 Thread hnahak

in context: http://apache-spark-user-list.1001560.n3.nabble.com/data-locality-in-logs-tp1276p21416.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Re: Data locality running Spark on Mesos

2015-01-11 Thread Michael V Le

as with Mesos. Looking at the logs again, it looks like the locality info between the stand-alone and Mesos coarse-grained mode are very similar. I must have been hallucinating earlier thinking somehow the data locality information was different. So this whole thing might just simply be due to the fact

Re: Data locality running Spark on Mesos

2015-01-10 Thread Timothy Chen

for every task? Of course, any perceived slow down will probably be very dependent on the workload. I just want to have a feel of the possible overhead (e.g., factor of 2 or 3 slowdown?). If not a data locality issue, perhaps this overhead can be a factor in the slowdown I observed, at least

Re: Data locality running Spark on Mesos

2015-01-09 Thread Michael V Le

executors for every task? Of course, any perceived slow down will probably be very dependent on the workload. I just want to have a feel of the possible overhead (e.g., factor of 2 or 3 slowdown?). If not a data locality issue, perhaps this overhead can be a factor in the slowdown I observed, at least

Data locality running Spark on Mesos

2015-01-08 Thread mvle

, especially for coarse-grained mode as the executors supposedly do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User

Re: Data locality running Spark on Mesos

2015-01-08 Thread Tim Chen

do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Data Locality

2015-01-06 Thread Andrew Ash

You can also read about locality here in the docs: http://spark.apache.org/docs/latest/tuning.html#data-locality On Tue, Jan 6, 2015 at 8:37 AM, Cody Koeninger c...@koeninger.org wrote: No, not all rdds have location information, and in any case tasks may be scheduled on non-local nodes

Re: Data Locality

2015-01-06 Thread Cody Koeninger

is local ie Node1 and Node 2(assuming Node1 and Node2 have enough resources to execute the tasks)? Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-Locality-tp21000.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Data Locality

2015-01-06 Thread gtinside

the data is local ie Node1 and Node 2(assuming Node1 and Node2 have enough resources to execute the tasks)? Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-Locality-tp21000.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: data locality, task distribution

2014-11-13 Thread Nathan Kronenfeld

I am seeing skewed execution times. As far as I can tell, they are attributable to differences in data locality - tasks with locality PROCESS_LOCAL run fast, NODE_LOCAL, slower, and ANY, slowest. This seems entirely as it should be - the question is, why the different locality levels? I am

Re: data locality, task distribution

2014-11-13 Thread Aaron Davidson

. As far as I can tell, they are attributable to differences in data locality - tasks with locality PROCESS_LOCAL run fast, NODE_LOCAL, slower, and ANY, slowest. This seems entirely as it should be - the question is, why the different locality levels? I am seeing skewed caching, as I

Re: data locality, task distribution

2014-11-12 Thread Aaron Davidson

...@oculusinfo.com wrote: Can anyone point me to a good primer on how spark decides where to send what task, how it distributes them, and how it determines data locality? I'm trying a pretty simple task - it's doing a foreach over cached data, accumulating some (relatively complex) values. So I see

Re: data locality, task distribution

2014-11-12 Thread Nathan Kronenfeld

, Nathan Kronenfeld nkronenf...@oculusinfo.com wrote: Can anyone point me to a good primer on how spark decides where to send what task, how it distributes them, and how it determines data locality? I'm trying a pretty simple task - it's doing a foreach over cached data, accumulating some

Re: data locality, task distribution

2014-11-12 Thread Aaron Davidson

point me to a good primer on how spark decides where to send what task, how it distributes them, and how it determines data locality? I'm trying a pretty simple task - it's doing a foreach over cached data, accumulating some (relatively complex) values. So I see several inconsistencies I don't

data locality, task distribution

2014-11-11 Thread Nathan Kronenfeld

Can anyone point me to a good primer on how spark decides where to send what task, how it distributes them, and how it determines data locality? I'm trying a pretty simple task - it's doing a foreach over cached data, accumulating some (relatively complex) values. So I see several

problem with data locality api

2014-09-28 Thread qinwei

Hi, everyone? ? I come across with a problem about data locality, i found these?example?code in 《Spark-on-YARN-A-Deep-Dive-Sandy-Ryza.pdf》? ??? ??val locData = InputFormatInfo.computePreferredLocations(Seq(new InputFormatInfo(conf, classOf[TextInputFormat], new Path(“myfile.txt

RE: problem with data locality api

2014-09-28 Thread Shao, Saisai

Subject: problem with data locality api Hi, everyone I come across with a problem about data locality, i found these example code in 《Spark-on-YARN-A-Deep-Dive-Sandy-Ryza.pdf》 val locData = InputFormatInfo.computePreferredLocations(Seq(new InputFormatInfo(conf, classOf[TextInputFormat

回复: RE: problem with data locality api

2014-09-28 Thread qinwei

for your reply! qinwei ?发件人：?Shao, Saisai发送时间：?2014-09-28?14:42收件人：?qinwei抄送：?user主题：?RE: problem with data locality api Hi ? First conf is used for Hadoop to determine the locality distribution of HDFS file. Second conf is used for Spark, though with the same name, actually they are two

Re: data locality

2014-08-30 Thread Chris Fregly

, 2014 at 4:13 AM, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, In the standalone mode, how can we check data locality is working as expected when tasks are assigned? Thanks! On 23 Jul, 2014, at 12:49 am, Sandy Ryza sandy.r...@cloudera.com wrote: On standalone there is still special

Re: data locality

2014-07-22 Thread Sandy Ryza

for your patience! -- *From:* Sandy Ryza [mailto:sandy.r...@cloudera.com] *Sent:* 2014年7月22日 9:47 *To:* user@spark.apache.org *Subject:* Re: data locality This currently only works for YARN. The standalone default is to place an executor on every node

RE: data locality

2014-07-21 Thread Haopu Wang

you for your patience! From: Sandy Ryza [mailto:sandy.r...@cloudera.com] Sent: 2014年7月22日 9:47 To: user@spark.apache.org Subject: Re: data locality This currently only works for YARN. The standalone default is to place an executor on every node for every

data locality

2014-07-18 Thread Haopu Wang

I have a standalone spark cluster and a HDFS cluster which share some of nodes. When reading HDFS file, how does spark assign tasks to nodes? Will it ask HDFS the location for each file block in order to get a right worker node? How about a spark cluster on Yarn? Thank you very much!

Re: data locality

2014-07-18 Thread Sandy Ryza

any information about where the input data for the jobs is located. If the executors occupy significantly fewer nodes than exist in the cluster, it can be difficult for Spark to achieve data locality. The workaround for this is an API that allows passing in a set of preferred locations when

RE: data locality

2014-07-18 Thread Haopu Wang

executors to use for this application? Thanks again! From: Sandy Ryza [mailto:sandy.r...@cloudera.com] Sent: Friday, July 18, 2014 3:44 PM To: user@spark.apache.org Subject: Re: data locality Hi Haopu, Spark will ask HDFS for file block locations

Performance of Akka or TCP Socket input sources vs HDFS: Data locality in Spark Streaming

2014-06-10 Thread Nilesh Chakraborty

HDFS on the same cluster as Spark, write the data from the Actors to HDFS, and then use HDFS as input source for Spark Streaming. Does this result in better performance due to data locality (with HDFS data replication turned on)? I think performance should be almost the same with actors, since

Re: Performance of Akka or TCP Socket input sources vs HDFS: Data locality in Spark Streaming

2014-06-10 Thread Michael Cutler

fault tolerance, and the ability to checkpoint and recover even if master fails. Cheers, Nilesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Performance-of-Akka-or-TCP-Socket-input-sources-vs-HDFS-Data-locality-in-Spark-Streaming-tp7317.html Sent

76 matches

Mail list logo