Re: Spark application doesn't scale to worker nodes

2016-07-05 Thread Mathieu Longtin
System Classpath >>> http://10.2.0.4:35639/jars/SparkPOC.jarAdded By User >>> >>> On 4 July 2016 at 21:43, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>>> well this will be apparent from the Environment tab of GUI. I

Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Mathieu Longtin
is essentially a MLlib pipeline for >>>>> training a classifier, in this case RandomForest but could be a >>>>> DecesionTree just for the sake of simplicity. >>>>> >>>>> But when I submit the spark application to the cluster via spark >>>>> submit it is running out of memory. Even though the executors are >>>>> "taken"/created in the cluster they are esentially doing nothing ( poor >>>>> cpu, nor memory utilization) while the master seems to do all the work >>>>> which finally results in OOM. >>>>> >>>>> My submission is following: >>>>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp >>>>> SparkPOC.jar 10 4.3 >>>>> >>>>> I am submitting from the master node. >>>>> >>>>> By default it is running in client mode which the driver process is >>>>> attached to spark-shell. >>>>> >>>>> Do I need to set up some settings to make MLlib algos parallelized and >>>>> distributed as well or all is driven by parallel factor set on dataframe >>>>> with input data? >>>>> >>>>> Essentially it seems that all work is just done on master and the rest >>>>> is idle. >>>>> Any hints what to check? >>>>> >>>>> Thx >>>>> Jakub >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Jakub Stransky >>> cz.linkedin.com/in/jakubstransky >>> >>> >> > > > -- > Jakub Stransky > cz.linkedin.com/in/jakubstransky > > -- Mathieu Longtin 1-514-803-8977

Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Mathieu Longtin
ll. >> >> Do I need to set up some settings to make MLlib algos parallelized and >> distributed as well or all is driven by parallel factor set on dataframe >> with input data? >> >> Essentially it seems that all work is just done on master and the rest is >> idle. >> Any hints what to check? >> >> Thx >> Jakub >> >> >> >> > -- Mathieu Longtin 1-514-803-8977

Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
Try to figure out what the env vars and arguments of the worker JVM and Python process are. Maybe you'll get a clue. On Mon, Jul 4, 2016 at 11:42 AM Mathieu Longtin <math...@closetwork.org> wrote: > I started with a download of 1.6.0. These days, we use a self compiled > 1.6.2. >

Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
the daemons? Your spark > was built from source code or downloaded as a binary, though that should > not technically change anything? > > On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin <math...@closetwork.org> > wrote: > >> 1.6.1. >> >> I have no idea. SPAR

Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
1.6.1. I have no idea. SPARK_WORKER_CORES should do the same. On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav <ashraag...@gmail.com> wrote: > Which version of Spark are you using? 1.6.1? > > Any ideas as to why it is not working in ours? > > On Mon, Jul 4, 2016 at 8:51 PM,

Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
mber of cores > in the node (1 parent and 3 workers). Limiting it via spark-env.sh file by > giving SPARK_WORKER_CORES=1 also didn't help. > > When you said it helped you and limited it to 2 processes in your cluster, > how many cores did each machine have? > > On

Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
same as setting "spark.executor.cores" to 1? And how can I > specify "--cores=1" from the application? > > On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <math...@closetwork.org> > wrote: > >> When running the executor, put --cores=1. We use this

Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
t; > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ----------

Re: Do tasks from the same application run in different JVMs

2016-06-29 Thread Mathieu Longtin
Same JVMs. On Wed, Jun 29, 2016 at 8:48 AM Huang Meilong <ims...@outlook.com> wrote: > Hi, > > In spark, tasks from different applications run in different JVMs, then > what about tasks from the same application? > -- Mathieu Longtin 1-514-803-8977

Re: Reporting warnings from workers

2016-06-16 Thread Mathieu Longtin
Jun 15, 2016 at 1:24 PM, Mathieu Longtin <math...@closetwork.org> > wrote: > >> Is there a way to report warnings from the workers back to the driver >> process? >> >> Let's say I have an RDD and do this: >> >> newrdd = rdd.map(somefunction) >>

Reporting warnings from workers

2016-06-15 Thread Mathieu Longtin
. Is that possible? -- Mathieu Longtin 1-514-803-8977

Re: Not able to write output to local filsystem from Standalone mode.

2016-05-25 Thread Mathieu Longtin
.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Tue, May 24, 2016 at 4:04 PM, Mathieu Longtin <math...@closetwork.org> > wrote: > > In standalone mode, executor assume they have access to a shared file > > system. The driver creates the

Re: Spark-submit hangs indefinitely after job completion.

2016-05-24 Thread Mathieu Longtin
orSystem-akka.actor.default-dispatcher-2] > remote.RemoteActorRefProvider$RemotingTerminator > (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting shut down. > > > I have to do a ctrl-c to terminate the spark-submit process. This is > really a weird problem and I have no idea how to fix this. Please let me > know if there are any logs I should be looking at, or doing things > differently here. > > > -- Mathieu Longtin 1-514-803-8977

Re: Not able to write output to local filsystem from Standalone mode.

2016-05-24 Thread Mathieu Longtin
form of reproduction, > dissemination, copying, disclosure, modification, > distribution and / or publication of this message without the prior > written consent of authorized representative of > HCL is strictly prohibited. If you have received this email in error > please delete it and notify the sender immediately. > Before opening any email and/or attachments, please check them for viruses > and other defects. > > > > > - To > unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional > commands, e-mail: user-h...@spark.apache.org -- Mathieu Longtin 1-514-803-8977

Re: How to set the degree of parallelism in Spark SQL?

2016-05-23 Thread Mathieu Longtin
1560.n3.nabble.com/How-to-set-the-degree-of-parallelism-in-Spark-SQL-tp26996.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > -- Mathieu Longtin 1-514-803-8977

Re: Starting executor without a master

2016-05-20 Thread Mathieu Longtin
word all available nodes but I am not convinced if it will use > those nodes? Someone can possibly clarify this > > HTH > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.link

Re: Starting executor without a master

2016-05-19 Thread Mathieu Longtin
master there's an interface you can implement to try > that if you really want to (ExternalClusterManager), but it's > currently "private[spark]" and it probably wouldn't be a very simple > task. > > > On Thu, May 19, 2016 at 10:45 AM, Mathieu Longtin > <math...@closet

Re: Starting executor without a master

2016-05-19 Thread Mathieu Longtin
ark-submit job by any chance Mathieu? > > Cheers > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Starting executor without a master

2016-05-19 Thread Mathieu Longtin
rofile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 19 May 2016 at 21:33, Mathieu Longtin <math...@closetwork.org> wrote: > >

Re: Starting executor without a master

2016-05-19 Thread Mathieu Longtin
or-cores=2 \ > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordp

Re: Starting executor without a master

2016-05-19 Thread Mathieu Longtin
.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 19 May 2016 at 20:37, Mathieu Longtin <math.

Re: Starting executor without a master

2016-05-19 Thread Mathieu Longtin
BUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 19 May 2016 at 18:45, Mathieu Longtin <math...@closetwork.org> wrote: > >> First a bit of context: >> We use Spark on a platform where each user start workers as needed. This >&g

Re: support for golang

2016-05-14 Thread Mathieu Longtin
r considered/attempted to support > golang with spark . > > -Thanks > Sourav > -- Mathieu Longtin 1-514-803-8977

Re: pyspark mappartions ()

2016-05-14 Thread Mathieu Longtin
Abi <analyst.tech.j...@gmail.com> wrote: > > > On Tue, May 10, 2016 at 2:20 PM, Abi <analyst.tech.j...@gmail.com> wrote: > >> Is there any example of this ? I want to see how you write the the >> iterable example > > > -- Mathieu Longtin 1-514-803-8977

Re: Is this possible to do in spark ?

2016-05-12 Thread Mathieu Longtin
I am not sure how to read two text files in spark at the same time and > associated them with the serial number. Is there a way of doing this in > place given that we know the directory structure ? OR we should be > transforming the data anyway to solve this ? > -- Mathieu Longtin 1-514-803-8977

Re: best fit - Dataframe and spark sql use cases

2016-05-10 Thread Mathieu Longtin
e's experience . > > > Thanks, > Divya > > > > > > -- Mathieu Longtin 1-514-803-8977

Re: removing header from csv file

2016-05-03 Thread Mathieu Longtin
> Divya >>>> >>>> On 27 April 2016 at 13:24, Ashutosh Kumar <kmr.ashutos...@gmail.com> >>>> wrote: >>>> >>>>> I see there is a library spark-csv which can be used for removing >>>>> header and processing of csv files. But it seems it works with sqlcontext >>>>> only. Is there a way to remove header from csv files without sqlcontext ? >>>>> >>>>> Thanks >>>>> Ashutosh >>>>> >>>> >>>> >>> >>> -- >>> >>> M'BAREK Med Nihed, >>> Fedora Ambassador, TUNISIA, Northern Africa >>> http://www.nihed.com >>> >>> <http://tn.linkedin.com/in/nihed> >>> >>> >>> -- Mathieu Longtin 1-514-803-8977

Re: Transformation question

2016-04-27 Thread Mathieu Longtin
e? > > Thanks in advance > -- Mathieu Longtin 1-514-803-8977