reduceByKey to get all associated values

2014-08-07 Thread Konstantin Kudryavtsev
to sort it in particular way and apply some business logic. Thank you in advance, Konstantin Kudryavtsev

Re: Ports required for running spark

2014-07-31 Thread Konstantin Kudryavtsev
. On Thu, Jul 31, 2014 at 6:17 PM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Larry, I'm afraid this is standalone mode, I'm interesting in YARN Also, I don't see port-in-trouble 33007 which i believe related to Akka Thank you, Konstantin Kudryavtsev On Thu, Jul 31

Spark scheduling with Capacity scheduler

2014-07-17 Thread Konstantin Kudryavtsev
you, Konstantin Kudryavtsev

Filtering data during the read

2014-07-09 Thread Konstantin Kudryavtsev
to apply filtering during the read step? and don't put all objects into memory? Thank you, Konstantin Kudryavtsev

java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Konstantin Kudryavtsev
Spark 1.0In map I create new object each time, as I understand I can't reuse object similar to MapReduce development? I wondered, if you could point me how is it possible to avoid GC overhead...thank you in advance Thank you, Konstantin Kudryavtsev

how to convert RDD to PairRDDFunctions ?

2014-07-08 Thread Konstantin Kudryavtsev
, Konstantin Kudryavtsev

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-07 Thread Konstantin Kudryavtsev
(do I need RPMs installations or only build spark on edge node?) Thank you, Konstantin Kudryavtsev On Mon, Jul 7, 2014 at 4:34 AM, Robert James srobertja...@gmail.com wrote: I can say from my experience that getting Spark to work with Hadoop 2 is not for the beginner; after solving one

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-07 Thread Konstantin Kudryavtsev
, Konstantin Kudryavtsev On Mon, Jul 7, 2014 at 1:57 PM, Krishna Sankar ksanka...@gmail.com wrote: Konstantin, 1. You need to install the hadoop rpms on all nodes. If it is Hadoop 2, the nodes would have hdfs YARN. 2. Then you need to install Spark on all nodes. I haven't had

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-07 Thread Konstantin Kudryavtsev
Hi Chester, Thank you very much, it is clear now - just two different way to support spark on acluster Thank you, Konstantin Kudryavtsev On Mon, Jul 7, 2014 at 3:22 PM, Chester @work ches...@alpinenow.com wrote: In Yarn cluster mode, you can either have spark on all the cluster nodes

Control number of tasks per stage

2014-07-07 Thread Konstantin Kudryavtsev
Hi all, is it any way to control the number tasks per stage? currently I see situation when only 2 tasks are created per stage and each of them is very slow, at the same time cluster has a huge number of unused nodes Thank you, Konstantin Kudryavtsev

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread Konstantin Kudryavtsev
Hello, thanks for your message... I'm confused, Hortonworhs suggest install spark rpm on each node, but on Spark main page said that yarn enough and I don't need to install it... What the difference? sent from my HTC On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote: Konstantin, HWRK

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread Konstantin Kudryavtsev
with Hadoop Spark can run on Hadoop 2's YARN cluster manager, and can read any existing Hadoop data. If you have a Hadoop 2 cluster, you can run Spark without any installation needed. And this is confusing for me... do I need rpm installation on not?... Thank you, Konstantin Kudryavtsev On Sun

Spark 1.0 failed on HDP 2.0 with absurd exception

2014-07-05 Thread Konstantin Kudryavtsev
) --worker-memory MEM Memory per Worker (e.g. 1000M, 2G) (Default: 1G) Seems like the old spark notation any ideas? Thank you, Konstantin Kudryavtsev

[no subject]

2014-07-05 Thread Konstantin Kudryavtsev
can it be fixed? Thank you, Konstantin Kudryavtsev

Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-04 Thread Konstantin Kudryavtsev
main class (required) ...bla-bla-bla any ideas? how can I make it works? Thank you, Konstantin Kudryavtsev

Re: Run spark unit test on Windows 7

2014-07-03 Thread Konstantin Kudryavtsev
/please-read-if-experiencing-job-failures?forum=hdinsight 2) put this file into d:\winutil\bin 3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\) after that test runs Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote: You

Run spark unit test on Windows 7

2014-07-02 Thread Konstantin Kudryavtsev
(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true) } finally { if(sc != null) sc.stop() } } Why is it trying to access hadoop at all? and how can I fix it? Thank you in advance Thank you, Konstantin Kudryavtsev

Re: Run spark unit test on Windows 7

2014-07-02 Thread Konstantin Kudryavtsev
:120) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38

NullPointerException on ExternalAppendOnlyMap

2014-07-02 Thread Konstantin Kudryavtsev
) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Do you have any idea what is it? how can I debug this issue or perhaps access another log? Thank you, Konstantin Kudryavtsev

unsibscribe

2014-05-05 Thread Konstantin Kudryavtsev
unsibscribe Thank you, Konstantin Kudryavtsev

Re: Pig on Spark

2014-04-10 Thread Konstantin Kudryavtsev
Hi Mayur, I wondered if you could share your findings in some way (github, blog post, etc). I guess your experience will be very interesting/useful for many people sent from Lenovo YogaTablet On Apr 8, 2014 8:48 PM, Mayur Rustagi mayur.rust...@gmail.com wrote: Hi Ankit, Thanx for all the work

Re: how to save RDD partitions in different folders?

2014-04-04 Thread Konstantin Kudryavtsev
Hi Evan, Could you please provide a code-snippet? Because it not clear for me, in Hadoop you need to engage addNamedOutput method and I'm in stuck how to use it from Spark Thank you, Konstantin Kudryavtsev On Fri, Apr 4, 2014 at 5:27 PM, Evan Sparks evan.spa...@gmail.com wrote: Have a look