Great news.. thank you very much!
On Thu, Nov 8, 2018, 5:19 PM Stavros Kontopoulos <
stavros.kontopou...@lightbend.com wrote:
> Awesome!
>
> On Thu, Nov 8, 2018 at 9:36 PM, Jules Damji wrote:
>
>> Indeed!
>>
>> Sent from my iPhone
>> Pardon the dumb thumb typos :)
>>
>> On Nov 8, 2018, at 11:31
Hello
SBT's incremental compilation was a huge plus to build spark+scala
applications in SBT for some time. It seems Maven can also support
incremental compilation with Zinc server. Considering that, I am interested
to know communities experience -
1. Spark documentation says SBT is being used
Hello
Has anyone used Spark to solve minimum cost flow problems in Spark? I
am quite new to combinatorial optimization algorithms so any help or
suggestions, libraries are very appreciated.
Thanks
Swapnil
Ping.. Can someone please correct me whether this is an issue or not.
-
Swapnil
On Thu, Aug 31, 2017 at 12:27 PM, Swapnil Shinde <swapnilushi...@gmail.com>
wrote:
> Hello All
>
> I am observing some strange results with aggregateByKey API which is
> implemented with comb
Hello All
I am observing some strange results with aggregateByKey API which is
implemented with combineByKey. Not sure if this is by design or bug -
I created this toy example but same problem can be observed on large
datasets as well -
*case class ABC(key: Int, c1: Int, c2: Int)*
*case class
Hello
I am using spark-2.0.1 and saw that CSV fileformat stores output with
JOBUUID in it.
https://github.com/apache/spark/blob/v2.0.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala#L191
I want to avoid csv writing JOBUUID in it. Is there any property
-03-08 2:45 GMT+08:00 Swapnil Shinde <swapnilushi...@gmail.com>:
>
>> Hello all
>>I have a spark job that reads parquet data and partition it based on
>> one of the columns. I made sure partitions equally distributed and not
>> skewed. My code looks like this -
>>
Hello all
I have a spark job that reads parquet data and partition it based on one
of the columns. I made sure partitions equally distributed and not skewed.
My code looks like this -
datasetA.write.partitonBy("column1").parquet(outputPath)
Execution plan -
[image: Inline image 1]
All
Hello All
I am facing FileNotFoundException for shuffle index file when running
job with large data. Same job runs fine with smaller datasets. These our my
cluster specifications -
No of nodes - 19
Total cores - 380
Memory per executor - 32G
Spark 1.6 mapr version
f it is broadcast join,
> you will see it in explain.
>
> On Sat, Nov 26, 2016 at 10:51 AM, Swapnil Shinde <swapnilushi...@gmail.com
> > wrote:
>
>> Hello
>> I am trying a broadcast join on dataframes but it is still doing
>> SortMergeJoin. I even tr
Hello
I am trying a broadcast join on dataframes but it is still doing
SortMergeJoin. I even try setting spark.sql.autoBroadcastJoinThreshold
higher but still no luck.
Related piece of code-
val c = a.join(braodcast(b), "id")
On a side note, if I do SizeEstimator.estimate(b) and it
Hello
I am trying to do inner join with broadcastHint and getting below exception
-
I tried to increase "sqlContext.conf.autoBroadcastJoinThreshold" but still
no luck.
*Code snippet-*
val dpTargetUvOutput =
pweCvfMUVDist.as("a").join(broadcast(sourceAssgined.as("b")), $"a.web_id"
===
if I am wrong.
On Fri, Aug 28, 2015 at 1:12 AM, Swapnil Shinde swapnilushi...@gmail.com
wrote:
Thanks Rishitesh !!
1. I get that driver doesn't need to be on master but there is lot of
communication between driver and cluster. That's why co-located gateway was
recommended. How much
Hello
I am new to spark world and started to explore recently in standalone mode.
It would be great if I get clarifications on below doubts-
1. Driver locality - It is mentioned in documentation that client
deploy-mode is not good if machine running spark-submit is not co-located
with worker
.
On Thursday, August 27, 2015, Swapnil Shinde swapnilushi...@gmail.com
wrote:
Hello
I am new to spark world and started to explore recently in standalone
mode. It would be great if I get clarifications on below doubts-
1. Driver locality - It is mentioned in documentation that client
deploy
15 matches
Mail list logo