Hi Jasbir,
Yes, you are right. Do you have any idea about my question?
Thanks,
Fei
On Mon, Jan 16, 2017 at 12:37 AM, wrote:
> Hi,
>
>
>
> Coalesce is used to decrease the number of partitions. If you give the
> value of numPartitions greater than the current
Hi,
Coalesce is used to decrease the number of partitions. If you give the value of
numPartitions greater than the current partition, I don’t think RDD number of
partitions will be increased.
Thanks,
Jasbir
From: Fei Hu [mailto:hufe...@gmail.com]
Sent: Sunday, January 15, 2017 10:10 PM
To:
Wondering whether it’ll be possible to do structured logging in Spark.
Adding "org.apache.logging.log4j" % "log4j-slf4j-impl" % “2.6.2” makes it
to complain about multiple bindings for slf4j
cheers
Appu
Waiting for suggestions/help on this...
On Wed, Jan 11, 2017 at 12:14 PM, Raju Bairishetti wrote:
> Hello,
>
>Spark sql is generating query plan with all partitions information even
> though if we apply filters on partitions in the query. Due to this, spark
> driver/hive
Thanks Raju.
On Sun, Jan 15, 2017 at 9:49 PM, Raju Bairishetti wrote:
> Total number of tasks in the stage is: 21428
> Number of tasks completed so far: 44
> Number of tasks running now: 48
>
> On Mon, Jan 16, 2017 at 11:41 AM, KhajaAsmath Mohammed <
> mdkhajaasm...@gmail.com>
Total number of tasks in the stage is: 21428
Number of tasks completed so far: 44
Number of tasks running now: 48
On Mon, Jan 16, 2017 at 11:41 AM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
> Hi,
>
> when running the spark jobs, I see the numbers in stages. can anyone tell
> what
Hi,
when running the spark jobs, I see the numbers in stages. can anyone tell
what does this number indicate in the below case.
[Stage 2:>(44 + 48) /
21428]
44+28 and 21428.
Thanks,
Asmath
I've also written a small blog post that may help you out:
https://medium.com/@therevoltingx/test-driven-development-w-apache-spark-746082b44941#.ia6stbl6n
On Sun, Jan 15, 2017 at 12:13 PM, Silvio Fiorito
wrote:
> You should check out Holden’s excellent
No WorriesI also faced the issue a while back and good people in the
community helped me:)
On Mon, Jan 16, 2017 at 9:55 AM, Md. Rezaul Karim <
rezaul.ka...@insight-centre.org> wrote:
> Hi Ayan,
>
> Thanks a million.
>
> Regards,
> _
> *Md. Rezaul Karim*,
Hi Ayan,
Thanks a million.
Regards,
_
*Md. Rezaul Karim*, BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
IDA Business Park, Dangan, Galway, Ireland
Web: http://www.reza-analytics.eu/index.html
archive.apache.org will always have all the releases:
http://archive.apache.org/dist/spark/
@Spark users: it may be a good idea to have a "To download older versions,
click here" link to Spark Download page?
On Mon, Jan 16, 2017 at 8:16 AM, Md. Rezaul Karim <
rezaul.ka...@insight-centre.org>
Hi,
I am looking for Spark 1.2.0 version. I tried to download in the Spark
website but it's no longer available.
Any suggestion?
Regards,
_
*Md. Rezaul Karim*, BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
You should check out Holden’s excellent spark-testing-base package:
https://github.com/holdenk/spark-testing-base
From: A Shaikh
Date: Sunday, January 15, 2017 at 1:14 PM
To: User
Subject: TDD in Spark
Whats the most popular Testing approach for
use yarn :)
"spark-submit --master yarn"
On Sun, Jan 15, 2017 at 7:55 PM, Darren Govoni wrote:
> So what was the answer?
>
>
>
> Sent from my Verizon, Samsung Galaxy smartphone
>
> Original message
> From: Andrew Holway
>
So what was the answer?
Sent from my Verizon, Samsung Galaxy smartphone
Original message From: Andrew Holway
Date: 1/15/17 11:37 AM (GMT-05:00) To: Marco
Mistroni Cc: Neil Jonkers , User
Hi Anastasios,
Thanks for your information. I will look into the CoalescedRDD code.
Thanks,
Fei
On Sun, Jan 15, 2017 at 12:21 PM, Anastasios Zouzias
wrote:
> Hi Fei,
>
> I looked at the code of CoalescedRDD and probably what I suggested will
> not work.
>
> Speaking of
Whats the most popular Testing approach for Spark App. I am looking
something in the line of TDD.
Hi Fei,
I looked at the code of CoalescedRDD and probably what I suggested will not
work.
Speaking of which, CoalescedRDD is private[spark]. If this was not the
case, you could set balanceSlack to 1, and get what you requested, see
Hi Anastasios,
Thanks for your reply. If I just increase the numPartitions to be twice
larger, how coalesce(numPartitions: Int, shuffle: Boolean = false) keeps
the data locality? Do I need to define my own Partitioner?
Thanks,
Fei
On Sun, Jan 15, 2017 at 3:58 AM, Anastasios Zouzias
Darn. I didn't respond to the list. Sorry.
On Sun, Jan 15, 2017 at 5:29 PM, Marco Mistroni wrote:
> thanks Neil. I followed original suggestion from Andrw and everything is
> working fine now
> kr
>
> On Sun, Jan 15, 2017 at 4:27 PM, Neil Jonkers
Hi Rishi,
Thanks for your reply! The RDD has 24 partitions, and the cluster has a
master node + 24 computing nodes (12 cores per node). Each node will have a
partition, and I want to split each partition to two sub-partitions on the
same node to improve the parallelism and achieve high data
thanks Neil. I followed original suggestion from Andrw and everything is
working fine now
kr
On Sun, Jan 15, 2017 at 4:27 PM, Neil Jonkers wrote:
> Hello,
>
> Can you drop the url:
>
> spark://master:7077
>
> The url is used when running Spark in standalone mode.
>
>
Hello,
Can you drop the url:
spark://master:7077
The url is used when running Spark in standalone mode.
Regards
Original message From: Marco Mistroni
Date:15/01/2017 16:34 (GMT+02:00)
To: User Subject: Running Spark
on EMR
hi all
could anyone assist here?
i am trying to run spark 2.0.0 on an EMR cluster,but i am having issues
connecting to the master node
So, below is a snippet of what i am doing
sc =
SparkSession.builder.master(sparkHost).appName("DataProcess").getOrCreate()
sparkHost is passed as input
The biggest thing that any resource manager besides Spark's standalone
resource manager can do is manage other application resources. In a cluster
where you are running other workloads, you can't use Spark standalone to
arbitrate resource requirements across apps.
On Sun, Jan 15, 2017 at 1:55 PM
Hi,
What can mesos or yarn do that spark standalone cannot do?
Thanks!
Hi Fei,
How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ?
https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395
coalesce is mostly used for reducing the number of partitions before
writing to HDFS, but it might still be a
27 matches
Mail list logo