Re: Spark kryo serialization register Datatype[]

2016-12-21 Thread Vadim Semenov
to enable kryo serializer you just need to pass `spark.serializer=org.apache.spark.serializer.KryoSerializer` the `spark.kryo.registrationRequired` controls the following behavior: Whether to require registration with Kryo. If set to 'true', Kryo will > throw an exception if an unregistered

Re: Livy with Spark

2016-12-05 Thread Vadim Semenov
You mean share a single spark context across multiple jobs? https://github.com/spark-jobserver/spark-jobserver does the same On Mon, Dec 5, 2016 at 9:33 AM, Mich Talebzadeh wrote: > Hi, > > Has there been any experience using Livy with Spark to share multiple > Spark

Re: Live data visualisations with Spark

2016-11-08 Thread Vadim Semenov
Take a look at https://zeppelin.apache.org On Tue, Nov 8, 2016 at 11:13 AM, Andrew Holway < andrew.hol...@otternetworks.de> wrote: > Hello, > > A colleague and I are trying to work out the best way to provide live data > visualisations based on Spark. Is it possible to explore a dataset in spark

Re: How to avoid unnecessary spark starkups on every request?

2016-11-02 Thread Vadim Semenov
Take a look at https://github.com/spark-jobserver/spark-jobserver or https://github.com/cloudera/livy you can launch a persistent spark context and then submit your jobs using a already running context On Wed, Nov 2, 2016 at 3:34 AM, Fanjin Zeng wrote: > Hi, > > I

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-31 Thread Vadim Semenov
Have you tried to get number of threads in a running process using `cat /proc//status` ? On Sun, Oct 30, 2016 at 11:04 PM, kant kodali wrote: > yes I did run ps -ef | grep "app_name" and it is root. > > > > On Sun, Oct 30, 2016 at 8:00 PM, Chan Chor Pang

Re: Any Dynamic Compilation of Scala Query

2016-10-26 Thread Vadim Semenov
You can use Cloudera Livy for that https://github.com/cloudera/livy take a look at this example https://github.com/cloudera/livy#spark-example On Wed, Oct 26, 2016 at 4:35 AM, Mahender Sarangam < mahender.bigd...@outlook.com> wrote: > Hi, > > Is there any way to dynamically execute a string

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Vadim Semenov
oh, and try to run even smaller executors, i.e. with `spark.executor.memory` <= 16GiB. I wonder what result you're going to get. On Sun, Oct 2, 2016 at 1:24 AM, Vadim Semenov <vadim.seme...@datadoghq.com> wrote: > > Do you mean running a multi-JVM 'cluster' on the single machine

Re: get different results when debugging and running scala program

2016-10-01 Thread Vadim Semenov
The question has no connection to spark. In future, if you use apache mailing lists, use external services to add screenshots and make sure that your code is formatted so other members'd be able to read it. On Fri, Sep 30, 2016 at 11:25 AM, chen yong wrote: > Hello All, > >

Re: Spark on yarn enviroment var

2016-10-01 Thread Vadim Semenov
The question should be addressed to the oozie community. As far as I remember, a spark action doesn't have support of env variables. On Fri, Sep 30, 2016 at 8:11 PM, Saurabh Malviya (samalviy) < samal...@cisco.com> wrote: > Hi, > > > > I am running spark on yarn using oozie. > > > > When submit

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Vadim Semenov
> long[]​. > Is it possible to force this specific operation to go off-heap so that it > can possibly use a bigger page size? > > > > ​>Babak​ > > > *Babak Alipour ,* > *University of Florida* > > On Fri, Sep 30, 2016 at 3:03 PM, Vadim Semenov < > vadim.seme...@d

Re: Restful WS for Spark

2016-10-01 Thread Vadim Semenov
, will > job run in Hadoop cluster ? > How stable is this API as we will need to implement it in production env. > Livy looks more promising but still need not matured. > Have you tested any of them ? > > Thanks, > Abhishek > Abhishek > > > On Fri, Sep 30, 2016 at 11:39

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Vadim Semenov
ad.run(Thread.java:745) > > I'm running spark in local mode so there is only one executor, the driver > and spark.driver.memory is set to 64g. Changing the driver's memory doesn't > help. > > *Babak Alipour ,* > *University of Florida* > > On Fri, Sep 30, 2016 at 2:05 P

Re: Restful WS for Spark

2016-09-30 Thread Vadim Semenov
There're two REST job servers that work with spark: https://github.com/spark-jobserver/spark-jobserver https://github.com/cloudera/livy On Fri, Sep 30, 2016 at 2:07 PM, ABHISHEK wrote: > Hello all, > Have you tried accessing Spark application using Restful web-services? >

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Vadim Semenov
Can you post the whole exception stack trace? What are your executor memory settings? Right now I assume that it happens in UnsafeExternalRowSorter -> UnsafeExternalSorter:insertRecord Running more executors with lower `spark.executor.memory` should help. On Fri, Sep 30, 2016 at 12:57 PM,

Re: using SparkILoop.run

2016-09-26 Thread Vadim Semenov
Add "-Dspark.master=local[*]" to the VM properties of your test run. On Mon, Sep 26, 2016 at 2:25 PM, Mohit Jaggi wrote: > I want to use the following API SparkILoop.run(...). I am writing a test > case as that passes some scala code to spark interpreter and receives >

Re: LIVY VS Spark Job Server

2016-09-15 Thread Vadim Semenov
I have experience with both Livy & spark-jobserver. spark-jobserver gives you better API, particularly, if you want to work within a single spark context. Livy supports submitting python & R code while spark-jobserver doesn't support it. spark-jobserver code is more complex, it actively uses

Dynamically change executors settings

2016-08-26 Thread Vadim Semenov
Hi spark users, I wonder if it's possible to change executors settings on-the-fly. I have the following use-case: I have a lot of non-splittable skewed files in a custom format that I read using a custom Hadoop RecordReader. These files can be small & huge and I'd like to use only one-two cores

<    1   2