If you are running Spark with local[*] as master, there will be a single
process whose memory will be controlled by --driver-memory command line
option to spark submit. Check
http://spark.apache.org/docs/latest/configuration.html
spark.driver.memory 1g Amount of memory to use for the driver
So you want to set an accumulator to 1 after a transformation has fully
completed? Or what exactly do you want to do?
On Mon, Nov 13, 2017 at 9:47 PM vaquar khan wrote:
> Confirmed ,you can use Accumulators :)
>
> Regards,
> Vaquar khan
>
> On Mon, Nov 13, 2017 at 10:58
https://stackoverflow.com/questions/26562033/how-to-set-apache-spark-executor-memory
Regards,
Vaquar khan
On Mon, Nov 13, 2017 at 6:22 PM, Alec Swan wrote:
> Hello,
>
> I am using the Spark library to convert JSON/Snappy files to ORC/ZLIB
> format. Effectively, my Java
Confirmed ,you can use Accumulators :)
Regards,
Vaquar khan
On Mon, Nov 13, 2017 at 10:58 AM, Kedarnath Dixit <
kedarnath_di...@persistent.com> wrote:
> Hi,
>
>
> We need some way to toggle the flag of a variable in transformation.
>
>
> We are thinking to make use of spark Accumulators for
Hi Ashish, bear in mind that EMR has some additional tooling available that
smoothes out some S3 problems that you may / almost certainly will
encounter.
We are using Spark / S3 not on EMR and have encountered issues with file
consistency, you can deal with it but be aware it's additional
Hi Joel,
Here are the relevant snippets of my code and an OOM error thrown
in frameWriter.save(..). Surprisingly, the heap dump is pretty small ~60MB
even though I am running with -Xmx10G and 4G executor and driver memory as
shown below.
SparkConf sparkConf = new SparkConf()
Another option that we are trying internally is to uses Mesos for isolating
different jobs or groups. Within a single group, using Livy to create
different spark contexts also works.
- Affan
On Tue, Nov 14, 2017 at 8:43 AM, ashish rawat wrote:
> Thanks Sky Yin. This really
Thanks Sky Yin. This really helps.
On Nov 14, 2017 12:11 AM, "Sky Yin" wrote:
We are running Spark in AWS EMR as data warehouse. All data are in S3 and
metadata in Hive metastore.
We have internal tools to creat juypter notebook on the dev cluster. I
guess you can use
Have you tried increasing driver, exec mem (gc overhead too if required)?
your code snippet and stack trace will be helpful.
On Mon, Nov 13, 2017 at 7:23 PM Alec Swan wrote:
> Hello,
>
> I am using the Spark library to convert JSON/Snappy files to ORC/ZLIB
> format.
Hello,
I am using the Spark library to convert JSON/Snappy files to ORC/ZLIB
format. Effectively, my Java service starts up an embedded Spark cluster
(master=local[*]) and uses Spark SQL to convert JSON to ORC. However, I
keep getting OOM errors with large (~1GB) files.
I've tried different ways
This is not a Databricks forum.
On Mon, Nov 13, 2017 at 3:18 PM, Benjamin Kim wrote:
> I have a question about this. The documentation compares the concept
> similar to BigQuery. Does this mean that we will no longer need to deal
> with instances and just pay for execution
To add, we have a CDH 5.12 cluster with Spark 2.2 in our data center.
On Mon, Nov 13, 2017 at 3:15 PM Benjamin Kim wrote:
> Does anyone know if there is a connector for AWS Kinesis that can be used
> as a source for Structured Streaming?
>
> Thanks.
>
>
You can use the Databricks to connect to Kinesis:
https://databricks.com/blog/2017/08/09/apache-sparks-structured-streaming-with-amazon-kinesis-on-databricks.html
Cheers
Jules
Sent from my iPhone
Pardon the dumb thumb typos :)
> On Nov 13, 2017, at 3:15 PM, Benjamin Kim
I have a question about this. The documentation compares the concept
similar to BigQuery. Does this mean that we will no longer need to deal
with instances and just pay for execution duration and amount of data
processed? I’m just curious about how this will be priced.
Also, when will it be ready
Does anyone know if there is a connector for AWS Kinesis that can be used
as a source for Structured Streaming?
Thanks.
I need it cached to improve throughput ,only hope it can be refreshed once a
day not every batch.
> On Nov 13, 2017, at 4:49 PM, Burak Yavuz wrote:
>
> I think if you don't cache the jdbc table, then it should auto-refresh.
>
> On Mon, Nov 13, 2017 at 1:21 PM, spark
I think if you don't cache the jdbc table, then it should auto-refresh.
On Mon, Nov 13, 2017 at 1:21 PM, spark receiver
wrote:
> Hi
>
> I’m using struct streaming(spark 2.2) to receive Kafka msg ,it works
> great. The thing is I need to join the Kafka message with a
Hi
I’m using struct streaming(spark 2.2) to receive Kafka msg ,it works great.
The thing is I need to join the Kafka message with a relative static table
stored in mysql database (let’s call it metadata here).
So is it possible to reload the metadata table after some time interval(like
We are running Spark in AWS EMR as data warehouse. All data are in S3 and
metadata in Hive metastore.
We have internal tools to creat juypter notebook on the dev cluster. I
guess you can use zeppelin instead, or Livy?
We run genie as a job server for the prod cluster, so users have to submit
If you have only 1 user , its still possible to execute non-blocking long
running queries .
Best way is to have different users with pre assigned resources , run their
queries .
HTH
Thanks
Deepak
On Nov 13, 2017 23:56, "ashish rawat" wrote:
> Thanks Everyone. I am still
Thanks Everyone. I am still not clear on what is the right way to execute
support multiple users, running concurrent queries with Spark. Is it
through multiple spark contexts or through Livy (which creates a single
spark context only).
Also, what kind of isolation is possible with Spark SQL? If
Hi,
We need some way to toggle the flag of a variable in transformation.
We are thinking to make use of spark Accumulators for this purpose.
Can we use these as below:
Variables -> Initial Value
Variable1 -> 0
Variable2 -> 0
In one of the transformations if we need to make
Hi,
you can truncate datetimes like this (in pyspark), e.g. to 5 minutes:
import pyspark.sql.functions as F
df.select((F.floor(F.col('myDateColumn').cast('long') / 300) *
300).cast('timestamp'))
Best,
Eike
David Hodefi schrieb am Mo., 13. Nov. 2017 um
12:27 Uhr:
I am familiar with those functions, none of them is actually truncating a
date. We can use those methods to help implement truncate method. I think
truncating a day/ hour should be as simple as "truncate(...,"DD") or
truncate(...,"HH") ".
On Thu, Nov 9, 2017 at 8:23 PM, Gaspar Muñoz
24 matches
Mail list logo