Re: How to use Update statement or call stored procedure of Oracle from Spark

2017-07-20 Thread Xiayun Sun
is it only Update statement or in general queries do not work? And can you paste your code so far? We use stored procedures (ms sql though) from spark all the time with different db client libraries and never had any issue. On 21 July 2017 at 03:19, Cassa L wrote: > Hi, > I

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-20 Thread Udit Mehrotra
Hi Marcelo, Thanks for looking into it. I have opened a jira for this: https://issues.apache.org/jira/browse/SPARK-21494 And yes, it works fine with internal shuffle service. But for our system we have external shuffle/dynamic allocation configured by default. We wanted to try switching from

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-20 Thread Marcelo Vanzin
Also, things seem to work with all your settings if you disable use of the shuffle service (which also means no dynamic allocation), if that helps you make progress in what you wanted to do. On Thu, Jul 20, 2017 at 4:25 PM, Marcelo Vanzin wrote: > Hmm... I tried this with

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-20 Thread Marcelo Vanzin
Hmm... I tried this with the new shuffle service (I generally have an old one running) and also see failures. I also noticed some odd things in your logs that I'm also seeing in mine, but it's better to track these in a bug instead of e-mail. Please file a bug and attach your logs there, I'll

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-20 Thread ayan guha
Hi As Mark said, scheduler mode works within application ie within a Spark Session and Spark context. This is also clear if you think where you set the configuration - in a Spark Config which used to build a context. If you are using Yarn as resource manager, however, you can set YARN with fair

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-20 Thread Mark Hamstra
The fair scheduler doesn't have anything to do with reallocating resource across Applications. https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-across-applications https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application On Thu, Jul 20, 2017 at

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-20 Thread Gokula Krishnan D
Mark, Thanks for the response. Let me rephrase my statements. "I am submitting a Spark application(*Application*#A) with scheduler.mode as FAIR and dynamicallocation=true and it got all the available executors. In the meantime, submitting another Spark Application (*Application* # B) with the

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-20 Thread Mark Hamstra
First, Executors are not allocated to Jobs, but rather to Applications. If you run multiple Jobs within a single Application, then each of the Tasks associated with Stages of those Jobs has the potential to run on any of the Application's Executors. Second, once a Task starts running on an

Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-20 Thread Gokula Krishnan D
Hello All, We are having cluster with 50 Executors each with 4 Cores so can avail max. 200 Executors. I am submitting a Spark application(JOB A) with scheduler.mode as FAIR and dynamicallocation=true and it got all the available executors. In the meantime, submitting another Spark Application

How to use Update statement or call stored procedure of Oracle from Spark

2017-07-20 Thread Cassa L
Hi, I want to use Spark to parallelize some update operations on Oracle database. However I could not find a way to call Update statements (Update Employee WHERE ???) , use transactions or call stored procedures from Spark/JDBC. Has anyone had this use case before and how did you solve it?

Spark Streaming: Blocks and Partitions

2017-07-20 Thread Kalim, Faria
Hi, Just a quick clarification question: from what I understand, blocks in a batch together form a single RDD which is partitioned (usually using the HashPartitioner) across multiple tasks. First, is this correct? Second, the partitioner is called every single time a new task is created. Is

Re: How to insert a dataframe as a static partition to a partitioned table

2017-07-20 Thread Chaoyu Tang
Thanks Vadim. But I am looking for an API either in DataSet, DataFrame, or DataFrameWriter etc. The way you suggested can be done via a query like spark.sql(""" ALTER TABLE `table` ADD PARTITION (partcol=1) LOCATION '/path/to/your/dataset' """), and before that I write it to a specified location

Re: How to insert a dataframe as a static partition to a partitioned table

2017-07-20 Thread Vadim Semenov
This should work: ``` ALTER TABLE `table` ADD PARTITION (partcol=1) LOCATION '/path/to/your/dataset' ``` On Wed, Jul 19, 2017 at 6:13 PM, ctang wrote: > I wonder if there are any easy ways (or APIs) to insert a dataframe (or > DataSet), which does not contain the partition

Re: Failed to find Spark jars directory

2017-07-20 Thread Kaushal Shriyan
On Thu, Jul 20, 2017 at 7:51 PM, ayan guha wrote: > It depends on your need. There are clear instructions around how to run > mvn with specific hive and hadoop bindings. However if you are starting > out, i suggest you to use prebuilt ones. > Hi Ayan, I am setting up

Re: Failed to find Spark jars directory

2017-07-20 Thread ayan guha
It depends on your need. There are clear instructions around how to run mvn with specific hive and hadoop bindings. However if you are starting out, i suggest you to use prebuilt ones. On Fri, 21 Jul 2017 at 12:17 am, Kaushal Shriyan wrote: > On Thu, Jul 20, 2017 at

Re: Failed to find Spark jars directory

2017-07-20 Thread Kaushal Shriyan
On Thu, Jul 20, 2017 at 7:42 PM, ayan guha wrote: > You should download a pre built version. The code you have got is source > code, you need to build it to generate the jar files. > > Hi Ayan, Can you please help me understand to build to generate the jar files? Regards,

Re: Failed to find Spark jars directory

2017-07-20 Thread ayan guha
You should download a pre built version. The code you have got is source code, you need to build it to generate the jar files. On Thu, 20 Jul 2017 at 10:35 pm, Kaushal Shriyan wrote: > Hi, > > I have downloaded spark-2.2.0.tgz on CentOS 7.x and when i invoke >

Spark sc.textFile() files with more partitions Vs files with less partitions

2017-07-20 Thread Gokula Krishnan D
Hello All, our Spark Applications are designed to process the HDFS Files (Hive External Tables). Recently modified the Hive file size by setting the following parameters to ensure that files are having with the average size of 512MB. set hive.merge.mapfiles=true set hive.merge.mapredfiles=true

Failed to find Spark jars directory

2017-07-20 Thread Kaushal Shriyan
Hi, I have downloaded spark-2.2.0.tgz on CentOS 7.x and when i invoke /opt/spark-2.2.0/sbin/start-master.sh, i get *Failed to find Spark jars directory > (/opt/spark-2.2.0/assembly/target/scala-2.10/jars). You need to build > Spark with the target "package" before running this program.* I am

Re: Issue: Hive Table Stored as col(array) instead of Columns with Spark

2017-07-20 Thread Chetan Khatri
Anyone faced same kind of issue with Spark 2.0.1 ? On Thu, Jul 20, 2017 at 2:08 PM, Chetan Khatri wrote: > Hello All, > I am facing issue with storing Dataframe to Hive table with partitioning , > without partitioning it works good. > > *Spark 2.0.1* > >

What does spark.python.worker.memory affect?

2017-07-20 Thread Cyanny LIANG
Hi As the documentation said: spark.python.worker.memory Amount of memory to use per python worker process during aggregation, in the same format as JVM memory strings (e.g. 512m, 2g). If the memory used during aggregation goes above this amount, it will spill the data into disks. I search the

solr data source not working

2017-07-20 Thread Imran Rajjad
I am unable to register the Solr Cloud as data source in Spark 2.1.0. Following the documentation at https://github.com/lucidworks/spark-solr#import-jar-file-via-spark-shell, I have used the 3.0.0.beta3 version. The system path is displaying the added jar as

Issue: Hive Table Stored as col(array) instead of Columns with Spark

2017-07-20 Thread Chetan Khatri
Hello All, I am facing issue with storing Dataframe to Hive table with partitioning , without partitioning it works good. *Spark 2.0.1* finalDF.write.mode(SaveMode.Overwrite).partitionBy("week_end_date").saveAsTable(OUTPUT_TABLE.get) and added below configuration too:

Re: Setting initial weights of ml.classification.LogisticRegression similar to mllib.classification.LogisticRegressionWithLBFGS

2017-07-20 Thread Nick Pentreath
weightCol sets the weight for each individual row of data (training example). It does not set the initial coefficients. On Thu, 20 Jul 2017 at 10:22 Aseem Bansal wrote: > Hi > > I had asked about this somewhere else too and was told that weightCol > method does that > > On

Re: Setting initial weights of ml.classification.LogisticRegression similar to mllib.classification.LogisticRegressionWithLBFGS

2017-07-20 Thread Aseem Bansal
Hi I had asked about this somewhere else too and was told that weightCol method does that On Thu, Jul 20, 2017 at 12:50 PM, Nick Pentreath wrote: > Currently it's not supported, but is on the roadmap: see > https://issues.apache.org/jira/browse/SPARK-13025 > > The

Re: Setting initial weights of ml.classification.LogisticRegression similar to mllib.classification.LogisticRegressionWithLBFGS

2017-07-20 Thread Nick Pentreath
Currently it's not supported, but is on the roadmap: see https://issues.apache.org/jira/browse/SPARK-13025 The most recent attempt is to start with simple linear regression, as here: https://issues.apache.org/jira/browse/SPARK-21386 On Thu, 20 Jul 2017 at 08:36 Aseem Bansal

Setting initial weights of ml.classification.LogisticRegression similar to mllib.classification.LogisticRegressionWithLBFGS

2017-07-20 Thread Aseem Bansal
We were able to set initial weights on https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS How can we set the initial weights on

Re: Spark 2.0 and Oracle 12.1 error

2017-07-20 Thread ayan guha
I remember facing similar issues while table had few particular data type, Numerical fields if I remember correctlyif possible, please validate data types in your select statement, and preferably do not use * or use some type conversion On Thu, Jul 20, 2017 at 4:10 PM, Cassa L

Spark-2.0 and Oracle 12.1 error: Unsupported type -101

2017-07-20 Thread Cassa L
Hi, I am trying to read data into Spark from Oracle using ojdb7 driver. The data is in JSON format. I am getting below error. Any idea on how to resolve it? ava.sql.SQLException: Unsupported type -101 at

Spark 2.0 and Oracle 12.1 error

2017-07-20 Thread Cassa L
Hi, I am trying to use Spark to read from Oracle (12.1) table using Spark 2.0. My table has JSON data. I am getting below exception in my code. Any clue? > java.sql.SQLException: Unsupported type -101 at