Re: Running Hive and Spark together with Dynamic Resource Allocation

2016-10-28 Thread Stéphane Verlet
This works for us yarn.nodemanager.aux-services mapreduce_shuffle,spark_shuffle yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.nodemanager.aux-services.spark_shuffle.class

java.lang.OutOfMemoryError: unable to create new native thread

2016-10-28 Thread kant kodali
"dag-scheduler-event-loop" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:714) at scala.concurrent.forkjoin.ForkJoinPool.tryAddWorker( ForkJoinPool.java:1672) at

Re: Can i get callback notification on Spark job completion ?

2016-10-28 Thread Marcelo Vanzin
On Fri, Oct 28, 2016 at 11:14 AM, Elkhan Dadashov wrote: > But if the map task will finish before the Spark job finishes, that means > SparkLauncher will go away. if the SparkLauncher handle goes away, then I > lose the ability to track the app's state, right ? > > I'm

Re: Can i get callback notification on Spark job completion ?

2016-10-28 Thread Elkhan Dadashov
Hi Marcelo, Thanks for the reply. But that means SparkAppHandle need to stay alive until Spark job completes. In my case Iaunch Spark job from the delegator Map task in cluster. That means the map task container need to stay alive, and wait until Spark Job completes. But if the map task will

Re: Can i get callback notification on Spark job completion ?

2016-10-28 Thread Marcelo Vanzin
If you look at the "startApplication" method it takes listeners as parameters. On Fri, Oct 28, 2016 at 10:23 AM, Elkhan Dadashov wrote: > Hi, > > I know that we can use SparkAppHandle (introduced in SparkLauncher version >>=1.6), and lt the delegator map task stay alive

Can i get callback notification on Spark job completion ?

2016-10-28 Thread Elkhan Dadashov
Hi, I know that we can use SparkAppHandle (introduced in SparkLauncher version >=1.6), and lt the delegator map task stay alive until the Spark job finishes. But i wonder, if this can be done via callback notification instead of polling. Can i get callback notification on Spark job completion ?

Re: Does the delegator map task of SparkLauncher need to stay alive until Spark job finishes ?

2016-10-28 Thread Elkhan Dadashov
I figured out JOB id returned from sparkAppHandle.getAppId(), is unique ApplicationId which looks like these: for Local mode Spark env: Local-1477184581895 For Distributed Spark mode: Application_1477504900821_0005 ApplicationId represents the globally unique identifier for an application. The

Re: CSV escaping not working

2016-10-28 Thread Daniel Barclay
In any case, it seems that the current behavior is not documented sufficiently. Koert Kuipers wrote: i can see how unquoted csv would work if you escape delimiters, but i have never seen that in practice. On Thu, Oct 27, 2016 at 2:03 PM, Jain, Nishit

Re: Writing to Parquet Job turns to wait mode after even completion of job

2016-10-28 Thread Chetan Khatri
Thank you for everyone, origin question " Every time, i write to parquet. it shows on Spark UI that stages succeeded but on spark shell it hold context on wait mode for almost 10 mins. then it clears broadcast, accumulator shared variables.". I don't think stopping context can resolve current

Re: Spark 2.0 with Hadoop 3.0?

2016-10-28 Thread Zoltán Zvara
Worked for me 2 weeks ago with a 3.0.0-alpha2 snapshot. Just changed hadoop.version while building. On Fri, Oct 28, 2016, 11:50 Sean Owen wrote: > I don't think it works, but, there is no Hadoop 3.0 right now either. As > the version implies, it's going to be somewhat

Re: [SPARK 2.0.0] Specifying remote repository when submitting jobs

2016-10-28 Thread Sean Owen
https://issues.apache.org/jira/browse/SPARK-17898 On Fri, Oct 28, 2016 at 11:56 AM Aseem Bansal wrote: > Hi > > We are trying to use some of our artifacts as dependencies while > submitting spark jobs. To specify the remote artifactory URL we are using > the following

Re: [SPARK 2.0.0] Specifying remote repository when submitting jobs

2016-10-28 Thread Aseem Bansal
To add to the above I have already checked the documentation, API and even looked at the source code at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L882 but I could not find anything hence I am asking here. On Fri, Oct 28, 2016 at 4:26

[SPARK 2.0.0] Specifying remote repository when submitting jobs

2016-10-28 Thread Aseem Bansal
Hi We are trying to use some of our artifacts as dependencies while submitting spark jobs. To specify the remote artifactory URL we are using the following syntax https://USERNAME:passw...@artifactory.companyname.com/artifactory/COMPANYNAME-libs But the resolution fails. Although the URL which

Weekly aggregation

2016-10-28 Thread Oshadha Gunawardena
Hello all, Please look in to this matter raised on Stackoverflow.com http://stackoverflow.com/questions/40302893/apache-spark-weekly-aggregation Thanks.

Re: Executor shutdown hook and initialization

2016-10-28 Thread Sean Owen
Have a look at this ancient JIRA for a lot more discussion about this: https://issues.apache.org/jira/browse/SPARK-650 You have exactly the same issue described by another user. For your context, your approach is sound. You can set a shutdown hook using the normal Java Runtime API. You may not

Re: Sharing RDDS across applications and users

2016-10-28 Thread vincent gromakowski
Bad idea. No caching, cluster over consumption... Have a look on instantiating a custom thriftserver on temp tables with fair scheduler to allow concurrent SQL requests. It's not a public API but you can find some examples. Le 28 oct. 2016 11:12 AM, "Mich Talebzadeh"

Re: Spark 2.0 with Hadoop 3.0?

2016-10-28 Thread Sean Owen
I don't think it works, but, there is no Hadoop 3.0 right now either. As the version implies, it's going to be somewhat different API-wise. On Thu, Oct 27, 2016 at 11:04 PM adam kramer wrote: > Is the version of Spark built for Hadoop 2.7 and later only for 2.x > releases? > >

Re: LIMIT issue of SparkSQL

2016-10-28 Thread Liz Bai
Sorry for the late reply. The size of the raw data is 20G and it is composed of two columns. We generated it by this . The test queries are very simple, 1). select ColA from Table limit 1 2). select ColA from Table

Re: Sharing RDDS across applications and users

2016-10-28 Thread Mich Talebzadeh
Hi, I think tempTable is private to the session that creates it. In Hive temp tables created by "CREATE TEMPORARY TABLE" are all private to the session. Spark is no different. The alternative may be everyone creates tempTable from the same DF? HTH Dr Mich Talebzadeh LinkedIn *

Re: Sharing RDDS across applications and users

2016-10-28 Thread Chanh Le
> Can you elaborate on how to implement "shared sparkcontext and fair > scheduling" option? It just reuse 1 Spark Context by not letting it stop when the application had done. Should check: livy, spark-jobserver FAIR https://spark.apache.org/docs/1.2.0/job-scheduling.html

Re: Sharing RDDS across applications and users

2016-10-28 Thread Mich Talebzadeh
Thanks all for your advice. As I understand in layman's term if I had two applications running successfully where app 2 was dependent on app 1 I would finish app 1, store the results in HDFS and the app 2 starts reading the results from HDFS and work on it. Using Alluxio or others replaces HDFS

convert spark dataframe to numpy (ndarray)

2016-10-28 Thread Zakaria Hili
Hi, Is there any way to convert a spark dataframe into numpy ndarray without using toPandas operation ? Example: C1 C2 C3 C4 0.7 3.0 1000 109540.9 4.2 1200 12345 I want to get this output: [(0.7, 3.0, 1000L, 10954),(0.9, 4.2, 1200L, 12345)], dtype=[('C1', '