Re: Typed datataset from Avro generated classes?

2020-04-20 Thread Elkhan Dadashov
tting an exception from the Encoders.bean call: > "java.lang.UnsupportedOperationException: Cannot have circular references > in bean class, but got the circular reference of class class > org.apache.avro.Schema" > > How can I get a typed dataset from Avro generated classes? > > Thanks. > -- > Joaquín > > -- Best regards, Elkhan Dadashov

Re: Does the delegator map task of SparkLauncher need to stay alive until Spark job finishes ?

2016-11-15 Thread Elkhan Dadashov
Thanks for the clarification, Marcelo. On Tue, Nov 15, 2016 at 6:20 PM Marcelo Vanzin <van...@cloudera.com> wrote: > On Tue, Nov 15, 2016 at 5:57 PM, Elkhan Dadashov <elkhan8...@gmail.com> > wrote: > > This is confusing in the sense that, the client needs to stay

Re: Does the delegator map task of SparkLauncher need to stay alive until Spark job finishes ?

2016-11-15 Thread Elkhan Dadashov
at 3:07 PM Marcelo Vanzin <van...@cloudera.com> wrote: > On Tue, Oct 18, 2016 at 3:01 PM, Elkhan Dadashov <elkhan8...@gmail.com> > wrote: > > Does my map task need to wait until Spark job finishes ? > > No... > > > Or is there any way, my map task finishes after

Re: Correct SparkLauncher usage

2016-11-12 Thread Elkhan Dadashov
Hey Mohammad, I implemented the code using CountDownLatch, and SparkLauncher works as expected. Hope it helps. Whenever appHandle.getState() reaching one of The Final states, then countDownLatch is decreased, and execution returns back to main program. ...final CountDownLatch countDownLatch =

Re: SparkDriver memory calculation mismatch

2016-11-12 Thread Elkhan Dadashov
smaller than YARN's max allowed size. > The driver just consumes what the driver consumes; I don't know of any > extra 'appmaster' component. > What do you mean by 'launched by the map task'? jobs are launched by the > driver only. > > On Sat, Nov 12, 2016 at 9:14 AM Elkhan Dadashov <

Re: SparkDriver memory calculation mismatch

2016-11-12 Thread Elkhan Dadashov
k about this value. mapreduce settings are irrelevant to Spark. Spark doesn't pay attention to the YARN settings, but YARN does. It enforces them, yes. It is not exempt from YARN. 896MB is correct there. yarn-client mode does not ignore driver properties, no. On Sat, Nov 12, 2016 at 2:18 AM Elkhan

Exception not failing Python applications (in yarn client mode) - SparkLauncher says app succeeded, where app actually has failed

2016-11-11 Thread Elkhan Dadashov
Hi, *Problem*: Spark job fails, but RM page says the job succeeded, also appHandle = sparkLauncher.startApplication() ... appHandle.getState() returns Finished state - which indicates The application finished with a successful status, whereas the Spark job actually failed. *Environment*:

SparkDriver memory calculation mismatch

2016-11-11 Thread Elkhan Dadashov
Hi, Spark website indicates default spark properties as like this: I did not override any properties in spark-defaults.conf file, but when I launch Spark in YarnClient mode: spark.driver.memory 1g spark.yarn.am.memory 512m

appHandle.kill(), SparkSubmit Process, JVM questions related to SparkLauncher design and Spark Driver

2016-11-11 Thread Elkhan Dadashov
quot;" > This will not send a {@link #stop()} message to the application, so > it's recommended that users first try to > stop the application cleanly and only resort to this method if that fails. > """ > > So if you want to stop the application first, call stop(). >

Re: SparkLauncer 2.0.1 version working incosistently in yarn-client mode

2016-11-10 Thread Elkhan Dadashov
tch.countDown(); } } @Override public void infoChanged(SparkAppHandle handle) {} @Override public void run() {} } On Mon, Nov 7, 2016 at 9:46 AM Marcelo Vanzin <van...@cloudera.com> wrote: > On Sat, Nov 5, 2016 at 2:54 AM, Elkhan Dadashov <

SparkLauncer 2.0.1 version working incosistently in yarn-client mode

2016-11-05 Thread Elkhan Dadashov
Hi, I'm running Spark 2.0.1 version with Spark Launcher 2.0.1 version on Yarn cluster. I launch map task which spawns Spark job via SparkLauncher#startApplication(). Deploy mode is yarn-client. I'm running in Mac laptop. I have this snippet of code: SparkAppHandle appHandle =

Re: Can i get callback notification on Spark job completion ?

2016-10-28 Thread Elkhan Dadashov
On Fri, Oct 28, 2016 at 10:23 AM, Elkhan Dadashov <elkhan8...@gmail.com> wrote: > Hi, > > I know that we can use SparkAppHandle (introduced in SparkLauncher version >>=1.6), and lt the delegator map task stay alive until the Spark job > finishes. But i wonder, if this c

Can i get callback notification on Spark job completion ?

2016-10-28 Thread Elkhan Dadashov
Hi, I know that we can use SparkAppHandle (introduced in SparkLauncher version >=1.6), and lt the delegator map task stay alive until the Spark job finishes. But i wonder, if this can be done via callback notification instead of polling. Can i get callback notification on Spark job completion ?

Re: Does the delegator map task of SparkLauncher need to stay alive until Spark job finishes ?

2016-10-28 Thread Elkhan Dadashov
. The globally unique nature of the identifier is achieved by using the cluster timestamp i.e. start-time of the ResourceManager along with a monotonically increasing counter for the application. On Sat, Oct 22, 2016 at 5:18 PM Elkhan Dadashov <elkhan8...@gmail.com> wrote: > I found answer

Re: Does the delegator map task of SparkLauncher need to stay alive until Spark job finishes ?

2016-10-22 Thread Elkhan Dadashov
t; What is the recommended way of getting logs and logging of Spark execution > while using sparkLauncer#startApplication() ? > > Thanks. > > On Tue, Oct 18, 2016 at 3:07 PM Marcelo Vanzin <van...@cloudera.com> > wrote: > > On Tue, Oct 18, 2016 at 3:01 PM, Elkhan Dadashov <e

Re: Does the delegator map task of SparkLauncher need to stay alive until Spark job finishes ?

2016-10-22 Thread Elkhan Dadashov
#startApplication() ? Thanks. On Tue, Oct 18, 2016 at 3:07 PM Marcelo Vanzin <van...@cloudera.com> wrote: > On Tue, Oct 18, 2016 at 3:01 PM, Elkhan Dadashov <elkhan8...@gmail.com> > wrote: > > Does my map task need to wait until Spark job finishes ? > > No... > >

Does the delegator map task of SparkLauncher need to stay alive until Spark job finishes ?

2016-10-18 Thread Elkhan Dadashov
Hi, Does the delegator map task of SparkLauncher need to stay alive until Spark job finishes ? 1) Currently, I have mapper tasks, which launches Spark job via SparkLauncer#startApplication() Does my map task need to wait until Spark job finishes ? Or is there any way, my map task finishes

Re: How does the # of tasks affect # of threads?

2015-08-04 Thread Elkhan Dadashov
- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Regards, Connor Zanin Computer Science University of Delaware -- Best regards, Elkhan Dadashov

Re: SparkLauncher not notified about finished job - hangs infinitely.

2015-07-31 Thread Elkhan Dadashov
()) Should be spark.getOutputStream() Cheers On Fri, Jul 31, 2015 at 10:02 AM, Elkhan Dadashov elkhan8...@gmail.com wrote: Hi Tomasz, *Answer to your 1st question*: Clear/read the error (spark.getErrorStream()) and output (spark.getInputStream()) stream buffers before you call

Re: SparkLauncher not notified about finished job - hangs infinitely.

2015-07-31 Thread Elkhan Dadashov
(PI value is not printed out). The log of the container can be found here: http://pastebin.com/9KHi81r4 I tried to execute the submitting application both with Oracle Java 8 and 7. Any hints what might be wrong? Best regards, Tomasz -- Best regards, Elkhan Dadashov

Re: [ Potential bug ] Spark terminal logs say that job has succeeded even though job has failed in Yarn cluster mode

2015-07-28 Thread Elkhan Dadashov
, Elkhan Dadashov elkhan8...@gmail.com wrote: Any updates on this bug ? Why Spark log results Job final status does not match ? (one saying that job has failed, another stating that job has succeeded) Thanks. On Thu, Jul 23, 2015 at 4:43 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: Hi all

Re: [ Potential bug ] Spark terminal logs say that job has succeeded even though job has failed in Yarn cluster mode

2015-07-28 Thread Elkhan Dadashov
to know about the job status. On Tue, Jul 28, 2015 at 11:17 AM, Elkhan Dadashov elkhan8...@gmail.com wrote: Thanks Corey for your answer, Do you mean that final status : SUCCEEDED in terminal logs means that YARN RM could clean the resources after the application has finished (application

Re: [ Potential bug ] Spark terminal logs say that job has succeeded even though job has failed in Yarn cluster mode

2015-07-28 Thread Elkhan Dadashov
. Have you filed a bug? On Tue, Jul 28, 2015 at 11:17 AM, Elkhan Dadashov elkhan8...@gmail.com wrote: Thanks Corey for your answer, Do you mean that final status : SUCCEEDED in terminal logs means that YARN RM could clean the resources after the application has finished (application

Re: [ Potential bug ] Spark terminal logs say that job has succeeded even though job has failed in Yarn cluster mode

2015-07-28 Thread Elkhan Dadashov
way to know about Spark job progress final status in Java ? Thanks. On Tue, Jul 28, 2015 at 1:17 PM, Corey Nolet cjno...@gmail.com wrote: On Tue, Jul 28, 2015 at 2:17 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: Thanks Corey for your answer, Do you mean that final status : SUCCEEDED

Re: [ Potential bug ] Spark terminal logs say that job has succeeded even though job has failed in Yarn cluster mode

2015-07-27 Thread Elkhan Dadashov
Any updates on this bug ? Why Spark log results Job final status does not match ? (one saying that job has failed, another stating that job has succeeded) Thanks. On Thu, Jul 23, 2015 at 4:43 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: Hi all, While running Spark Word count python

[ Potential bug ] Spark terminal logs say that job has succeeded even though job has failed in Yarn cluster mode

2015-07-23 Thread Elkhan Dadashov
Hi all, While running Spark Word count python example with intentional mistake in *Yarn cluster mode*, Spark terminal states final status as SUCCEEDED, but log files state correct results indicating that the job failed. Why terminal log output application log output contradict each other ? If

Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules (i.e., numpy) to be shipped with)

2015-07-17 Thread Elkhan Dadashov
, Jun 25, 2015 at 12:55 PM, Marcelo Vanzin van...@cloudera.com wrote: Please take a look at the pull request with the actual fix; that will explain why it's the same issue. On Thu, Jun 25, 2015 at 12:51 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: Thanks Marcelo. But my case is different

Re: Command builder problem when running worker in Windows

2015-07-17 Thread Elkhan Dadashov
-hadoop2.4\bin../bin/compute-classpath.cmd ) Thank you! Julien -- Best regards, Elkhan Dadashov

Re: Command builder problem when running worker in Windows

2015-07-17 Thread Elkhan Dadashov
edited afterwards? Julien On 07/17/2015 03:00 PM, Elkhan Dadashov wrote: Run Spark with --verbose flag, to see what it read for that path. I guess in Windows if you are using backslash, you need 2 of them (\\), or just use forward slashes everywhere. On Fri, Jul 17, 2015 at 2:40 PM, Julien

Why does SparkSubmit process takes so much virtual memory in yarn-cluster mode ?

2015-07-14 Thread Elkhan Dadashov
: 2303480 *byte *Virtual Memory Why does SparkSubmit process takes so much virtual memory in yarn-cluster mode ? (which usually causes your Yarn container to be killed because of outofmemory exception) On Tue, Jul 14, 2015 at 9:39 AM, Elkhan Dadashov elkhan8...@gmail.com wrote: Hi all, If you

ProcessBuilder in SparkLauncher is memory inefficient for launching new process

2015-07-14 Thread Elkhan Dadashov
Hi all, If you want to launch Spark job from Java in programmatic way, then you need to Use SparkLauncher. SparkLauncher uses ProcessBuilder for creating new process - Java seems handle process creation in an inefficient way. When you execute a process, you must first fork() and then exec().

Re: Why does SparkSubmit process takes so much virtual memory in yarn-cluster mode ?

2015-07-14 Thread Elkhan Dadashov
, 2015 at 10:57 AM, Marcelo Vanzin van...@cloudera.com wrote: On Tue, Jul 14, 2015 at 9:53 AM, Elkhan Dadashov elkhan8...@gmail.com wrote: While the program is running, these are the stats of how much memory each process takes: SparkSubmit process : 11.266 *gigabyte* Virtual Memory

Does Spark driver talk to NameNode directly or Yarn Resource Manager talks to NameNode to know the nodes which has required input blocks and informs Spark Driver ? (for launching Executors on nodes wh

2015-07-13 Thread Elkhan Dadashov
Hi folks, I have a question regarding scheduling of Spark job on Yarn cluster. Let's say there are 5 nodes on Yarn cluster: A,B,C, D, E In Spark job I'll be reading some huge text file (sc.textFile(fileName)) from HDFS and create an RDD. Assume that only nodes A, E contain the blocks of that

Re: Does Spark driver talk to NameNode directly or Yarn Resource Manager talks to NameNode to know the nodes which has required input blocks and informs Spark Driver ? (for launching Executors on node

2015-07-13 Thread Elkhan Dadashov
for the scheduling and will pick where the job runs. Look at it this way… you’re running a YARN job that runs spark. Yarn should run the job on A and E, however… if there aren’t enough free resources, it will run the job elsewhere. On Jul 13, 2015, at 10:10 AM, Elkhan Dadashov elkhan8

Re: Pyspark not working on yarn-cluster mode

2015-07-10 Thread Elkhan Dadashov
...@spark.apache.org -- Marcelo -- Best regards, Elkhan Dadashov

Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

2015-06-25 Thread Elkhan Dadashov
are local files, kmeans_data.txt is in HDFS. Thanks. On Thu, Jun 25, 2015 at 12:22 PM, Marcelo Vanzin van...@cloudera.com wrote: That sounds like SPARK-5479 which is not in 1.4... On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: In addition to previous emails

Re: How to run kmeans.py Spark example in yarn-cluster ?

2015-06-25 Thread Elkhan Dadashov
cluster) Thanks. On Wed, Jun 24, 2015 at 3:13 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: Hi all, I'm trying to run kmeans.py Spark example on Yarn cluster mode. I'm using Spark 1.4.0. I'm passing numpy-1.9.2.zip with --py-files flag. Here is the command I'm trying to execute

Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

2015-06-25 Thread Elkhan Dadashov
application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with) What are the configurations or installations to be done before running Python Spark job with 3rd party dependencies on Yarn-cluster ? Thanks in advance. On Thu, Jun 25, 2015 at 12:09 PM, Elkhan Dadashov elkhan8

How to run kmeans.py Spark example in yarn-cluster ?

2015-06-24 Thread Elkhan Dadashov
Hi all, I'm trying to run kmeans.py Spark example on Yarn cluster mode. I'm using Spark 1.4.0. I'm passing numpy-1.9.2.zip with --py-files flag. Here is the command I'm trying to execute but it fails: ./bin/spark-submit --master yarn-cluster --verbose --py-files

Re: What files/folders/jars spark-submit script depend on ?

2015-06-19 Thread Elkhan Dadashov
of the distributions. -Andrew 2015-06-19 10:12 GMT-07:00 Elkhan Dadashov elkhan8...@gmail.com: Hi all, If I want to ship spark-submit script to HDFS. and then call it from HDFS location for starting Spark job, which other files/folders/jars need to be transferred into HDFS with spark-submit script ? Due

What files/folders/jars spark-submit script depend on ?

2015-06-19 Thread Elkhan Dadashov
Hi all, If I want to ship spark-submit script to HDFS. and then call it from HDFS location for starting Spark job, which other files/folders/jars need to be transferred into HDFS with spark-submit script ? Due to some dependency issues, we can include Spark in our Java application, so instead we

Is there programmatic way running Spark job on Yarn cluster without using spark-submit script ?

2015-06-17 Thread Elkhan Dadashov
Hi all, Is there any way running Spark job in programmatic way on Yarn cluster without using spark-submit script ? I cannot include Spark jars on my Java application (due o dependency conflict and other reasons), so I'll be shipping Spark assembly uber jar (spark-assembly-1.3.1-hadoop2.3.0.jar)

Re: Is there programmatic way running Spark job on Yarn cluster without using spark-submit script ?

2015-06-17 Thread Elkhan Dadashov
(); spark.waitFor(); } } } On Wed, Jun 17, 2015 at 5:51 PM, Corey Nolet cjno...@gmail.com wrote: An example of being able to do this is provided in the Spark Jetty Server project [1] [1] https://github.com/calrissian/spark-jetty-server On Wed, Jun 17, 2015 at 8:29 PM, Elkhan Dadashov

Re: Spark Java API and minimum set of 3rd party dependencies

2015-06-12 Thread Elkhan Dadashov
wrote: You don't add dependencies to your app -- you mark Spark as 'provided' in the build and you rely on the deployed Spark environment to provide it. On Fri, Jun 12, 2015 at 7:14 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: Hi all, We want to integrate Spark in our Java application

Spark Java API and minimum set of 3rd party dependencies

2015-06-12 Thread Elkhan Dadashov
Hi all, We want to integrate Spark in our Java application using the Spark Java Api and run then on the Yarn clusters. If i want to run Spark on Yarn, which dependencies are must for including ? I looked at Spark POM

Is it possible to see Spark jobs on MapReduce job history ? (running Spark on YARN cluster)

2015-06-11 Thread Elkhan Dadashov
Hi all, I wonder if anyone has used use MapReduce Job History to show Spark jobs. I can see my Spark jobs (Spark running on Yarn cluster) on Resource manager (RM). I start Spark History server, and then through Spark's web-based user interface I can monitor the cluster (and track cluster and

Running SparkPi ( or JavaWordCount) example fails with Job aborted due to stage failure: Task serialization failed

2015-06-08 Thread Elkhan Dadashov
Hello, Running Spark examples fails on one machine, but succeeds in Virtual Machine with exact same Spark Java version installed. The weird part it fails on one machine, but runs successfully on VM. Did anyone face same problem ? Any solution tip ? Thanks in advance. *Spark version*: