Re: spark using two different versions of netty?

2016-10-10 Thread Paweł Szulc
I will look into fixing > this if applicable. > > On Mon, Oct 10, 2016 at 11:56 AM Paweł Szulc <paul.sz...@gmail.com> wrote: > >> Hi, >> >> quick question, why is Spark using two different versions of netty?: >> >> >>- io.netty:netty-all:4.0.29.Final:jar

Re: Apache Spark Slack

2016-05-16 Thread Paweł Szulc
Just realized that people have to be invited to this thing. You see, that's why Gitter is just simpler. I will try to figure it out ASAP 16 maj 2016 15:40 "Paweł Szulc" <paul.sz...@gmail.com> napisał(a): > I've just created this https://apache-spark.slack.com for ad-hoc >

Apache Spark Slack

2016-05-16 Thread Paweł Szulc
I've just created this https://apache-spark.slack.com for ad-hoc communications within the comunity. Everybody's welcome! -- Regards, Paul Szulc twitter: @rabbitonweb blog: www.rabbitonweb.com

Re: apache spark on gitter?

2016-05-16 Thread Paweł Szulc
I've just created https://apache-spark.slack.com On Thu, May 12, 2016 at 9:28 AM, Paweł Szulc <paul.sz...@gmail.com> wrote: > Hi, > > well I guess the advantage of gitter over maling list is the same as with > IRC. It's not actually a replacer because mailing list

Re: apache spark on gitter?

2016-05-12 Thread Paweł Szulc
is a bit of a scalability problem on the user@ list at >> the moment, just because it covers all of Spark. But adding a >> different all-Spark channel doesn't help that. >> >> Anyway maybe that's "why" >> >> >> On Wed, May 11, 2016 at 6:26 PM, P

Re: apache spark on gitter?

2016-05-11 Thread Paweł Szulc
no answer, but maybe one more time, a gitter channel for spark users would be a good idea! On Mon, May 9, 2016 at 1:45 PM, Paweł Szulc <paul.sz...@gmail.com> wrote: > Hi, > > I was wondering - why Spark does not have a gitter channel? > > -- > Regards, > Paul Szul

apache spark on gitter?

2016-05-09 Thread Paweł Szulc
Hi, I was wondering - why Spark does not have a gitter channel? -- Regards, Paul Szulc twitter: @rabbitonweb blog: www.rabbitonweb.com

Re: mapValues Transformation (JavaPairRDD)

2015-12-15 Thread Paweł Szulc
Hard to imagine. Can you share a code sample? On Tue, Dec 15, 2015 at 8:06 AM, Sushrut Ikhar wrote: > Hi, > I am finding it difficult to understand the following problem : > I count the number of records before and after applying the mapValues > transformation for a

Re: How Does aggregate work

2015-03-23 Thread Paweł Szulc
It is actually number of cores. If your processor has hyperthreading then it will be more (number of processors your OS sees) niedz., 22 mar 2015, 4:51 PM Ted Yu użytkownik yuzhih...@gmail.com napisał: I assume spark.default.parallelism is 4 in the VM Ashish was using. Cheers

Re: Problem getting program to run on 15TB input

2015-02-28 Thread Paweł Szulc
I would first check whether there is any possibility that after doing groupbykey one of the groups does not fit in one of the executors' memory. To back up my theory, instead of doing groupbykey + map try reducebykey + mapvalues. Let me know if that helped. Pawel Szulc http://rabbitonweb.com

Re: Problem getting program to run on 15TB input

2015-02-28 Thread Paweł Szulc
at 9:33 AM, Paweł Szulc paul.sz...@gmail.com wrote: I would first check whether there is any possibility that after doing groupbykey one of the groups does not fit in one of the executors' memory. To back up my theory, instead of doing groupbykey + map try reducebykey + mapvalues. Let me

Re: Question about Spark best practice when counting records.

2015-02-27 Thread Paweł Szulc
Currently if you use accumulators inside actions (like foreach) you have guarantee that, even if partition will be recalculated, the values will be correct. Same thing does NOT apply to transformations and you can not relay 100% on the values. Pawel Szulc pt., 27 lut 2015, 4:54 PM Darin McBeath

Re: High CPU usage in Driver

2015-02-27 Thread Paweł Szulc
Thanks for coming back to the list with response! pt., 27 lut 2015, 3:16 PM Himanish Kushary użytkownik himan...@gmail.com napisał: Hi, I was able to solve the issue. Putting down the settings that worked for me. 1) It was happening due to the large number of partitions.I *coalesce*'d

Re: CollectAsMap, Broadcasting.

2015-02-26 Thread Paweł Szulc
Correct me if I'm wrong, but he can actually run thus code without broadcasting the users map, however the code will be less efficient. czw., 26 lut 2015, 12:31 PM Sean Owen użytkownik so...@cloudera.com napisał: Yes, but there is no concept of executors 'deleting' an RDD. And you would want

Re: Is there a limit to the number of RDDs in a Spark context?

2015-02-18 Thread Paweł Szulc
Maybe you can omit using grouping all together with groupByKey? What is your next step after grouping elements by key? Are you trying to reduce values? If so then I would recommend using some reducing functions like for example reduceByKey or aggregateByKey. Those will first reduce value for each

Re: Can spark job have sideeffects (write files to FileSystem)

2014-12-15 Thread Paweł Szulc
try writing the files with java.nio.file.Files.write() -- I'd expect there is less that can go wrong with that simple call. On Thu, Dec 11, 2014 at 12:50 PM, Paweł Szulc paul.sz...@gmail.com wrote: Imagine simple Spark job, that will store each line of the RDD to a separate file val lines

Can spark job have sideeffects (write files to FileSystem)

2014-12-11 Thread Paweł Szulc
regards, Paweł Szulc

multiple spark context in same driver program

2014-11-06 Thread Paweł Szulc
Hi, quick question: I found this: http://docs.sigmoidanalytics.com/index.php/Problems_and_their_Solutions#Multiple_SparkContext:Failed_to_bind_to:.2F127.0.1.1:45916 My main question: is this constrain still valid? AM I not allowed to have two SparkContexts pointing to the same Spark Master in

hi all

2014-10-16 Thread Paweł Szulc
Hi, I just wanted to say hi all to the Spark community. I'm developing some stuff right now using Spark (we've started very recently). As the API documentation of Spark is really really good, I like to get deeper knowledge of the internal stuff -you know, the goodies. Watching movies from Spark

Re: reverse an rdd

2014-10-16 Thread Paweł Szulc
Just to have this clear, can you answer with quick yes or no: Does it mean that when I create RDD from a file and I simply iterate through it like this: sc.textFile(some_text_file.txt).foreach(line = println(line)) then the actual lines might come in different order then they are in the file?

Re: reverse an rdd

2014-10-16 Thread Paweł Szulc
Nevermind, I've just run the code in the REPL. Indeed if we do not sort, then the order is totally random. Which actually makes sens if you think about it On Thu, Oct 16, 2014 at 9:58 PM, Paweł Szulc paul.sz...@gmail.com wrote: Just to have this clear, can you answer with quick yes