Re: How to extract data in parallel from RDBMS tables

2019-03-28 Thread Surendra , Manchikanti
Hi Jason, Thanks for your reply, But I am looking for a way to parallelly extract all the tables in a Database. On Thu, Mar 28, 2019 at 2:50 PM Jason Nerothin wrote: > Yes. > > If you use the numPartitions option, your max parallelism will be that > number. See also: partitionColumn,

unsubscribe

2019-03-28 Thread Byron Lee
unsubscribe

Re: spark.submit.deployMode: cluster

2019-03-28 Thread Jason Nerothin
Meant this one: https://docs.databricks.com/api/latest/jobs.html On Thu, Mar 28, 2019 at 5:06 PM Pat Ferrel wrote: > Thanks, are you referring to > https://github.com/spark-jobserver/spark-jobserver or the undocumented > REST job server included in Spark? > > > From: Jason Nerothin > Reply:

BLAS library class def not found error

2019-03-28 Thread Serena S Yuan
Hi, I was using the apache spark machine learning library in java (posted this issue at https://stackoverflow.com/questions/55367722/apache-spark-in-java-machine-learning-com-github-fommil-netlib-f2jblas-dscalf?noredirect=1#comment97464462_55367722 ), and I had an error while trying to train

Re: spark.submit.deployMode: cluster

2019-03-28 Thread Pat Ferrel
Thanks, are you referring to https://github.com/spark-jobserver/spark-jobserver or the undocumented REST job server included in Spark? From: Jason Nerothin Reply: Jason Nerothin Date: March 28, 2019 at 2:53:05 PM To: Pat Ferrel Cc: Felix Cheung , Marcelo Vanzin , user Subject: Re:

Re: spark.submit.deployMode: cluster

2019-03-28 Thread Jason Nerothin
Check out the Spark Jobs API... it sits behind a REST service... On Thu, Mar 28, 2019 at 12:29 Pat Ferrel wrote: > ;-) > > Great idea. Can you suggest a project? > > Apache PredictionIO uses spark-submit (very ugly) and Apache Mahout only > launches trivially in test apps since most uses are

Re: How to extract data in parallel from RDBMS tables

2019-03-28 Thread Jason Nerothin
Yes. If you use the numPartitions option, your max parallelism will be that number. See also: partitionColumn, lowerBound, and upperBound https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html On Wed, Mar 27, 2019 at 23:06 Surendra , Manchikanti < surendra.manchika...@gmail.com> wrote:

Re: Spark Profiler

2019-03-28 Thread bo yang
Yeah, these options are very valuable. Just add another option :) We build a jvm profiler (https://github.com/uber-common/jvm-profiler) to monitor and profile Spark applications in large scale (e.g. sending metrics to kafka / hive for batch analysis). People could try it as well. On Wed, Mar 27,

Re: spark.submit.deployMode: cluster

2019-03-28 Thread Pat Ferrel
;-) Great idea. Can you suggest a project? Apache PredictionIO uses spark-submit (very ugly) and Apache Mahout only launches trivially in test apps since most uses are as a lib. From: Felix Cheung Reply: Felix Cheung Date: March 28, 2019 at 9:42:31 AM To: Pat Ferrel , Marcelo Vanzin Cc:

Re: Where does the Driver run?

2019-03-28 Thread Pat Ferrel
Thanks for the pointers. We’ll investigate. We have been told that the “Driver” is run in the launching JVM because deployMode = cluster is ignored if spark-submit is not used to launch. You are saying that there is a loophole and if you use one of these client classes there is a way to run part

Re: spark.submit.deployMode: cluster

2019-03-28 Thread Felix Cheung
If anyone wants to improve docs please create a PR. lol But seriously you might want to explore other projects that manage job submission on top of spark instead of rolling your own with spark-submit. From: Pat Ferrel Sent: Tuesday, March 26, 2019 2:38 PM

Adaptive query execution and CBO

2019-03-28 Thread Tomasz Krol
I asked this question while ago on StackOverflow but got no response, so trying here:) Whats your experience with using adaptive query execution and CBO? Do you use them enabled together? or seperate? Do you experience any issues using them? For example Ive seen that bucketing doesnt work

Re: Where does the Driver run?

2019-03-28 Thread Mich Talebzadeh
Hi, I have explained this in my following Linkedlin article "The Operational Advantages of Spark as a Distributed Processing Framework " An extract *2) YARN Deployment Modes* The term D*eployment mode of

Re: Streaming data out of spark to a Kafka topic

2019-03-28 Thread Mich Talebzadeh
Hi Gabor, I will look at the link and see what it provides. Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Where does the Driver run?

2019-03-28 Thread Jianneng Li
Hi Pat, The driver runs in the same JVM as SparkContext. You didn't go into detail about how you "launch" the job (i.e. how the SparkContext is created), so it's hard for me to guess where the driver is. For reference, we've had success launching Spark programmatically to YARN in cluster mode