is it only Update statement or in general queries do not work? And can you
paste your code so far?
We use stored procedures (ms sql though) from spark all the time with
different db client libraries and never had any issue.
On 21 July 2017 at 03:19, Cassa L wrote:
> Hi,
> I
Hi Marcelo,
Thanks for looking into it. I have opened a jira for this:
https://issues.apache.org/jira/browse/SPARK-21494
And yes, it works fine with internal shuffle service. But for our system we
have external shuffle/dynamic allocation configured by default. We wanted
to try switching from
Also, things seem to work with all your settings if you disable use of
the shuffle service (which also means no dynamic allocation), if that
helps you make progress in what you wanted to do.
On Thu, Jul 20, 2017 at 4:25 PM, Marcelo Vanzin wrote:
> Hmm... I tried this with
Hmm... I tried this with the new shuffle service (I generally have an
old one running) and also see failures. I also noticed some odd things
in your logs that I'm also seeing in mine, but it's better to track
these in a bug instead of e-mail.
Please file a bug and attach your logs there, I'll
Hi
As Mark said, scheduler mode works within application ie within a Spark
Session and Spark context. This is also clear if you think where you set
the configuration - in a Spark Config which used to build a context.
If you are using Yarn as resource manager, however, you can set YARN with
fair
The fair scheduler doesn't have anything to do with reallocating resource
across Applications.
https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-across-applications
https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
On Thu, Jul 20, 2017 at
Mark, Thanks for the response.
Let me rephrase my statements.
"I am submitting a Spark application(*Application*#A) with scheduler.mode
as FAIR and dynamicallocation=true and it got all the available executors.
In the meantime, submitting another Spark Application (*Application* # B)
with the
First, Executors are not allocated to Jobs, but rather to Applications. If
you run multiple Jobs within a single Application, then each of the Tasks
associated with Stages of those Jobs has the potential to run on any of the
Application's Executors. Second, once a Task starts running on an
Hello All,
We are having cluster with 50 Executors each with 4 Cores so can avail max.
200 Executors.
I am submitting a Spark application(JOB A) with scheduler.mode as FAIR and
dynamicallocation=true and it got all the available executors.
In the meantime, submitting another Spark Application
Hi,
I want to use Spark to parallelize some update operations on Oracle
database. However I could not find a way to call Update statements (Update
Employee WHERE ???) , use transactions or call stored procedures from
Spark/JDBC.
Has anyone had this use case before and how did you solve it?
Hi,
Just a quick clarification question: from what I understand, blocks in a batch
together form a single RDD which is partitioned (usually using the
HashPartitioner) across multiple tasks. First, is this correct? Second, the
partitioner is called every single time a new task is created. Is
Thanks Vadim. But I am looking for an API either in DataSet, DataFrame, or
DataFrameWriter etc. The way you suggested can be done via a query like
spark.sql(""" ALTER TABLE `table` ADD PARTITION (partcol=1) LOCATION
'/path/to/your/dataset' """), and before that I write it to a specified
location
This should work:
```
ALTER TABLE `table` ADD PARTITION (partcol=1) LOCATION
'/path/to/your/dataset'
```
On Wed, Jul 19, 2017 at 6:13 PM, ctang wrote:
> I wonder if there are any easy ways (or APIs) to insert a dataframe (or
> DataSet), which does not contain the partition
On Thu, Jul 20, 2017 at 7:51 PM, ayan guha wrote:
> It depends on your need. There are clear instructions around how to run
> mvn with specific hive and hadoop bindings. However if you are starting
> out, i suggest you to use prebuilt ones.
>
Hi Ayan,
I am setting up
It depends on your need. There are clear instructions around how to run mvn
with specific hive and hadoop bindings. However if you are starting out, i
suggest you to use prebuilt ones.
On Fri, 21 Jul 2017 at 12:17 am, Kaushal Shriyan
wrote:
> On Thu, Jul 20, 2017 at
On Thu, Jul 20, 2017 at 7:42 PM, ayan guha wrote:
> You should download a pre built version. The code you have got is source
> code, you need to build it to generate the jar files.
>
>
Hi Ayan,
Can you please help me understand to build to generate the jar files?
Regards,
You should download a pre built version. The code you have got is source
code, you need to build it to generate the jar files.
On Thu, 20 Jul 2017 at 10:35 pm, Kaushal Shriyan
wrote:
> Hi,
>
> I have downloaded spark-2.2.0.tgz on CentOS 7.x and when i invoke
>
Hello All,
our Spark Applications are designed to process the HDFS Files (Hive
External Tables).
Recently modified the Hive file size by setting the following parameters to
ensure that files are having with the average size of 512MB.
set hive.merge.mapfiles=true
set hive.merge.mapredfiles=true
Hi,
I have downloaded spark-2.2.0.tgz on CentOS 7.x and when i invoke
/opt/spark-2.2.0/sbin/start-master.sh, i get
*Failed to find Spark jars directory
> (/opt/spark-2.2.0/assembly/target/scala-2.10/jars). You need to build
> Spark with the target "package" before running this program.*
I am
Anyone faced same kind of issue with Spark 2.0.1 ?
On Thu, Jul 20, 2017 at 2:08 PM, Chetan Khatri
wrote:
> Hello All,
> I am facing issue with storing Dataframe to Hive table with partitioning ,
> without partitioning it works good.
>
> *Spark 2.0.1*
>
>
Hi
As the documentation said:
spark.python.worker.memory
Amount of memory to use per python worker process during aggregation, in
the same format as JVM memory strings (e.g. 512m, 2g). If the memory used
during aggregation goes above this amount, it will spill the data into
disks.
I search the
I am unable to register the Solr Cloud as data source in Spark 2.1.0.
Following the documentation at
https://github.com/lucidworks/spark-solr#import-jar-file-via-spark-shell, I
have used the 3.0.0.beta3 version.
The system path is displaying the added jar as
Hello All,
I am facing issue with storing Dataframe to Hive table with partitioning ,
without partitioning it works good.
*Spark 2.0.1*
finalDF.write.mode(SaveMode.Overwrite).partitionBy("week_end_date").saveAsTable(OUTPUT_TABLE.get)
and added below configuration too:
weightCol sets the weight for each individual row of data (training
example). It does not set the initial coefficients.
On Thu, 20 Jul 2017 at 10:22 Aseem Bansal wrote:
> Hi
>
> I had asked about this somewhere else too and was told that weightCol
> method does that
>
> On
Hi
I had asked about this somewhere else too and was told that weightCol
method does that
On Thu, Jul 20, 2017 at 12:50 PM, Nick Pentreath
wrote:
> Currently it's not supported, but is on the roadmap: see
> https://issues.apache.org/jira/browse/SPARK-13025
>
> The
Currently it's not supported, but is on the roadmap: see
https://issues.apache.org/jira/browse/SPARK-13025
The most recent attempt is to start with simple linear regression, as here:
https://issues.apache.org/jira/browse/SPARK-21386
On Thu, 20 Jul 2017 at 08:36 Aseem Bansal
We were able to set initial weights on
https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
How can we set the initial weights on
I remember facing similar issues while table had few particular data type,
Numerical fields if I remember correctlyif possible, please validate
data types in your select statement, and preferably do not use * or use
some type conversion
On Thu, Jul 20, 2017 at 4:10 PM, Cassa L
Hi,
I am trying to read data into Spark from Oracle using ojdb7 driver. The
data is in JSON format. I am getting below error. Any idea on how to
resolve it?
ava.sql.SQLException: Unsupported type -101
at
Hi,
I am trying to use Spark to read from Oracle (12.1) table using Spark 2.0.
My table has JSON data. I am getting below exception in my code. Any clue?
>
java.sql.SQLException: Unsupported type -101
at
30 matches
Mail list logo