I will look into fixing
> this if applicable.
>
> On Mon, Oct 10, 2016 at 11:56 AM Paweł Szulc <paul.sz...@gmail.com> wrote:
>
>> Hi,
>>
>> quick question, why is Spark using two different versions of netty?:
>>
>>
>>- io.netty:netty-all:4.0.29.Final:jar
Just realized that people have to be invited to this thing. You see,
that's why Gitter is just simpler.
I will try to figure it out ASAP
16 maj 2016 15:40 "Paweł Szulc" <paul.sz...@gmail.com> napisał(a):
> I've just created this https://apache-spark.slack.com for ad-hoc
>
I've just created this https://apache-spark.slack.com for ad-hoc
communications within the comunity.
Everybody's welcome!
--
Regards,
Paul Szulc
twitter: @rabbitonweb
blog: www.rabbitonweb.com
I've just created https://apache-spark.slack.com
On Thu, May 12, 2016 at 9:28 AM, Paweł Szulc <paul.sz...@gmail.com> wrote:
> Hi,
>
> well I guess the advantage of gitter over maling list is the same as with
> IRC. It's not actually a replacer because mailing list
is a bit of a scalability problem on the user@ list at
>> the moment, just because it covers all of Spark. But adding a
>> different all-Spark channel doesn't help that.
>>
>> Anyway maybe that's "why"
>>
>>
>> On Wed, May 11, 2016 at 6:26 PM, P
no answer, but maybe one more time, a gitter channel for spark users would
be a good idea!
On Mon, May 9, 2016 at 1:45 PM, Paweł Szulc <paul.sz...@gmail.com> wrote:
> Hi,
>
> I was wondering - why Spark does not have a gitter channel?
>
> --
> Regards,
> Paul Szul
Hi,
I was wondering - why Spark does not have a gitter channel?
--
Regards,
Paul Szulc
twitter: @rabbitonweb
blog: www.rabbitonweb.com
Hard to imagine. Can you share a code sample?
On Tue, Dec 15, 2015 at 8:06 AM, Sushrut Ikhar
wrote:
> Hi,
> I am finding it difficult to understand the following problem :
> I count the number of records before and after applying the mapValues
> transformation for a
It is actually number of cores. If your processor has hyperthreading then
it will be more (number of processors your OS sees)
niedz., 22 mar 2015, 4:51 PM Ted Yu użytkownik yuzhih...@gmail.com
napisał:
I assume spark.default.parallelism is 4 in the VM Ashish was using.
Cheers
I would first check whether there is any possibility that after doing
groupbykey one of the groups does not fit in one of the executors' memory.
To back up my theory, instead of doing groupbykey + map try reducebykey +
mapvalues.
Let me know if that helped.
Pawel Szulc
http://rabbitonweb.com
at 9:33 AM, Paweł Szulc paul.sz...@gmail.com
wrote:
I would first check whether there is any possibility that after doing
groupbykey one of the groups does not fit in one of the executors' memory.
To back up my theory, instead of doing groupbykey + map try reducebykey
+ mapvalues.
Let me
Currently if you use accumulators inside actions (like foreach) you have
guarantee that, even if partition will be recalculated, the values will be
correct. Same thing does NOT apply to transformations and you can not
relay 100% on the values.
Pawel Szulc
pt., 27 lut 2015, 4:54 PM Darin McBeath
Thanks for coming back to the list with response!
pt., 27 lut 2015, 3:16 PM Himanish Kushary użytkownik himan...@gmail.com
napisał:
Hi,
I was able to solve the issue. Putting down the settings that worked for
me.
1) It was happening due to the large number of partitions.I *coalesce*'d
Correct me if I'm wrong, but he can actually run thus code without
broadcasting the users map, however the code will be less efficient.
czw., 26 lut 2015, 12:31 PM Sean Owen użytkownik so...@cloudera.com
napisał:
Yes, but there is no concept of executors 'deleting' an RDD. And you
would want
Maybe you can omit using grouping all together with groupByKey? What is
your next step after grouping elements by key? Are you trying to reduce
values? If so then I would recommend using some reducing functions like for
example reduceByKey or aggregateByKey. Those will first reduce value for
each
try writing the files with java.nio.file.Files.write() -- I'd
expect there is less that can go wrong with that simple call.
On Thu, Dec 11, 2014 at 12:50 PM, Paweł Szulc paul.sz...@gmail.com
wrote:
Imagine simple Spark job, that will store each line of the RDD to a
separate file
val lines
regards,
Paweł Szulc
Hi,
quick question: I found this:
http://docs.sigmoidanalytics.com/index.php/Problems_and_their_Solutions#Multiple_SparkContext:Failed_to_bind_to:.2F127.0.1.1:45916
My main question: is this constrain still valid? AM I not allowed to have
two SparkContexts pointing to the same Spark Master in
Hi,
I just wanted to say hi all to the Spark community. I'm developing some
stuff right now using Spark (we've started very recently). As the API
documentation of Spark is really really good, I like to get deeper
knowledge of the internal stuff -you know, the goodies. Watching movies
from Spark
Just to have this clear, can you answer with quick yes or no:
Does it mean that when I create RDD from a file and I simply iterate
through it like this:
sc.textFile(some_text_file.txt).foreach(line = println(line))
then the actual lines might come in different order then they are in the
file?
Nevermind, I've just run the code in the REPL. Indeed if we do not sort,
then the order is totally random. Which actually makes sens if you think
about it
On Thu, Oct 16, 2014 at 9:58 PM, Paweł Szulc paul.sz...@gmail.com wrote:
Just to have this clear, can you answer with quick yes
21 matches
Mail list logo