So what was the answer?
Sent from my Verizon, Samsung Galaxy smartphone
Original message From: Andrew Holway
Date: 1/15/17 11:37 AM (GMT-05:00) To: Marco
Mistroni Cc: Neil Jonkers , User
Hi,
I am looking for Spark 1.2.0 version. I tried to download in the Spark
website but it's no longer available.
Any suggestion?
Regards,
_
*Md. Rezaul Karim*, BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
use yarn :)
"spark-submit --master yarn"
On Sun, Jan 15, 2017 at 7:55 PM, Darren Govoni wrote:
> So what was the answer?
>
>
>
> Sent from my Verizon, Samsung Galaxy smartphone
>
> Original message
> From: Andrew Holway
>
You should check out Holden’s excellent spark-testing-base package:
https://github.com/holdenk/spark-testing-base
From: A Shaikh
Date: Sunday, January 15, 2017 at 1:14 PM
To: User
Subject: TDD in Spark
Whats the most popular Testing approach for
Whats the most popular Testing approach for Spark App. I am looking
something in the line of TDD.
Hi Anastasios,
Thanks for your information. I will look into the CoalescedRDD code.
Thanks,
Fei
On Sun, Jan 15, 2017 at 12:21 PM, Anastasios Zouzias
wrote:
> Hi Fei,
>
> I looked at the code of CoalescedRDD and probably what I suggested will
> not work.
>
> Speaking of
Hi Ayan,
Thanks a million.
Regards,
_
*Md. Rezaul Karim*, BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
IDA Business Park, Dangan, Galway, Ireland
Web: http://www.reza-analytics.eu/index.html
No WorriesI also faced the issue a while back and good people in the
community helped me:)
On Mon, Jan 16, 2017 at 9:55 AM, Md. Rezaul Karim <
rezaul.ka...@insight-centre.org> wrote:
> Hi Ayan,
>
> Thanks a million.
>
> Regards,
> _
> *Md. Rezaul Karim*,
The biggest thing that any resource manager besides Spark's standalone
resource manager can do is manage other application resources. In a cluster
where you are running other workloads, you can't use Spark standalone to
arbitrate resource requirements across apps.
On Sun, Jan 15, 2017 at 1:55 PM
Hi Fei,
How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ?
https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395
coalesce is mostly used for reducing the number of partitions before
writing to HDFS, but it might still be a
Hello,
Can you drop the url:
spark://master:7077
The url is used when running Spark in standalone mode.
Regards
Original message From: Marco Mistroni
Date:15/01/2017 16:34 (GMT+02:00)
To: User Subject: Running Spark
on EMR
Hi Rishi,
Thanks for your reply! The RDD has 24 partitions, and the cluster has a
master node + 24 computing nodes (12 cores per node). Each node will have a
partition, and I want to split each partition to two sub-partitions on the
same node to improve the parallelism and achieve high data
thanks Neil. I followed original suggestion from Andrw and everything is
working fine now
kr
On Sun, Jan 15, 2017 at 4:27 PM, Neil Jonkers wrote:
> Hello,
>
> Can you drop the url:
>
> spark://master:7077
>
> The url is used when running Spark in standalone mode.
>
>
Darn. I didn't respond to the list. Sorry.
On Sun, Jan 15, 2017 at 5:29 PM, Marco Mistroni wrote:
> thanks Neil. I followed original suggestion from Andrw and everything is
> working fine now
> kr
>
> On Sun, Jan 15, 2017 at 4:27 PM, Neil Jonkers
Hi Anastasios,
Thanks for your reply. If I just increase the numPartitions to be twice
larger, how coalesce(numPartitions: Int, shuffle: Boolean = false) keeps
the data locality? Do I need to define my own Partitioner?
Thanks,
Fei
On Sun, Jan 15, 2017 at 3:58 AM, Anastasios Zouzias
Hi Fei,
I looked at the code of CoalescedRDD and probably what I suggested will not
work.
Speaking of which, CoalescedRDD is private[spark]. If this was not the
case, you could set balanceSlack to 1, and get what you requested, see
hi all
could anyone assist here?
i am trying to run spark 2.0.0 on an EMR cluster,but i am having issues
connecting to the master node
So, below is a snippet of what i am doing
sc =
SparkSession.builder.master(sparkHost).appName("DataProcess").getOrCreate()
sparkHost is passed as input
Hi,
What can mesos or yarn do that spark standalone cannot do?
Thanks!
archive.apache.org will always have all the releases:
http://archive.apache.org/dist/spark/
@Spark users: it may be a good idea to have a "To download older versions,
click here" link to Spark Download page?
On Mon, Jan 16, 2017 at 8:16 AM, Md. Rezaul Karim <
rezaul.ka...@insight-centre.org>
I've also written a small blog post that may help you out:
https://medium.com/@therevoltingx/test-driven-development-w-apache-spark-746082b44941#.ia6stbl6n
On Sun, Jan 15, 2017 at 12:13 PM, Silvio Fiorito
wrote:
> You should check out Holden’s excellent
Hi,
when running the spark jobs, I see the numbers in stages. can anyone tell
what does this number indicate in the below case.
[Stage 2:>(44 + 48) /
21428]
44+28 and 21428.
Thanks,
Asmath
Total number of tasks in the stage is: 21428
Number of tasks completed so far: 44
Number of tasks running now: 48
On Mon, Jan 16, 2017 at 11:41 AM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
> Hi,
>
> when running the spark jobs, I see the numbers in stages. can anyone tell
> what
Wondering whether it’ll be possible to do structured logging in Spark.
Adding "org.apache.logging.log4j" % "log4j-slf4j-impl" % “2.6.2” makes it
to complain about multiple bindings for slf4j
cheers
Appu
Hi,
Coalesce is used to decrease the number of partitions. If you give the value of
numPartitions greater than the current partition, I don’t think RDD number of
partitions will be increased.
Thanks,
Jasbir
From: Fei Hu [mailto:hufe...@gmail.com]
Sent: Sunday, January 15, 2017 10:10 PM
To:
Hi Jasbir,
Yes, you are right. Do you have any idea about my question?
Thanks,
Fei
On Mon, Jan 16, 2017 at 12:37 AM, wrote:
> Hi,
>
>
>
> Coalesce is used to decrease the number of partitions. If you give the
> value of numPartitions greater than the current
Thanks Raju.
On Sun, Jan 15, 2017 at 9:49 PM, Raju Bairishetti wrote:
> Total number of tasks in the stage is: 21428
> Number of tasks completed so far: 44
> Number of tasks running now: 48
>
> On Mon, Jan 16, 2017 at 11:41 AM, KhajaAsmath Mohammed <
> mdkhajaasm...@gmail.com>
Waiting for suggestions/help on this...
On Wed, Jan 11, 2017 at 12:14 PM, Raju Bairishetti wrote:
> Hello,
>
>Spark sql is generating query plan with all partitions information even
> though if we apply filters on partitions in the query. Due to this, spark
> driver/hive
27 matches
Mail list logo