Get broadcast (set in one method) in another method

2018-01-25 Thread Margusja
Hi

Maybe I am overthinking. I’d like to set broadcast in object A method y  and 
get it in object A method x.

In example:

object A {

def main (args: Array[String]) {
y()
x()
}

def x() : Unit = {
val a = bcA.value
...
}

def y(): String = {
val bcA = sc.broadcast(a) 
…
return “String value"
}

}




---
Br
Margus

Re: run spark job in yarn cluster mode as specified user

2018-01-22 Thread Margusja
Hi

org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor requires user 
in each node and right permissions set in necessary directories. 

Br
Margus


> On 22 Jan 2018, at 13:41, sd wang  wrote:
> 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor



Re: run spark job in yarn cluster mode as specified user

2018-01-21 Thread Margusja
Hi

One way to get it is use YARN configuration parameter - 
yarn.nodemanager.container-executor.class.
By default it is 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor

org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor - gives you 
user who executes script. 

Br
Margus



> On 22 Jan 2018, at 09:28, sd wang  wrote:
> 
> Hi Advisers,
> When submit spark job in yarn cluster mode, the job will be executed by 
> "yarn" user. Any parameters can change the user? I tried setting 
> HADOOP_USER_NAME but it did not work. I'm using spark 2.2. 
> Thanks for any help!



Re: spark job paused(active stages finished)

2017-11-08 Thread Margusja
You have to deal with failed jobs. In example try catch in your code.

Br Margus Roo


> On 9 Nov 2017, at 05:37, bing...@iflytek.com wrote:
> 
> Dear,All
> I have a simple spark job, as below, all tasks in the stage 2(sth failed, 
> retry) already finished. But the next stage never run.
> 
> 
>
> driver thread dump:  attachment( thread.dump)
> driver last log:
> 
> 
>  driver do not receive the 16 retry tasks report.Thank you ideas.
> 
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> 


[no subject]

2014-10-22 Thread Margusja

unsubscribe

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Exception failure: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2014-05-30 Thread Margusja
(Method.java:606)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145)
at 
java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)

at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)
14/05/30 11:53:56 INFO TaskSetManager: Starting task 8.0:0 as TID 75 on 
executor 0: dlvm1 (PROCESS_LOCAL)
14/05/30 11:53:56 INFO TaskSetManager: Serialized task 8.0:0 as 2975 
bytes in 1 ms
14/05/30 11:53:56 INFO TaskSetManager: Finished TID 74 in 62 ms on dlvm1 
(progress: 1/1)
14/05/30 11:53:56 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose 
tasks have all completed, from pool

14/05/30 11:53:56 INFO DAGScheduler: Completed ResultTask(9, 1)
14/05/30 11:53:56 INFO DAGScheduler: Stage 9 (take at DStream.scala:586) 
finished in 0.083 s
14/05/30 11:53:56 INFO SparkContext: Job finished: take at 
DStream.scala:586, took 0.101449564 s

---
Time: 140144003 ms
---
...
...
...

I know that KafkaReceiver is in my package:
[root@dlvm1 margusja_kafka]# cd libs/
[root@dlvm1 libs]# jar xvf spark-
spark-assembly_2.10-0.9.1-hadoop2.2.0.jar spark-streaming_2.10-0.9.1.jar 
spark-streaming-kafka_2.10-0.9.1.jar
[root@dlvm1 libs]# jar xvf spark-streaming-kafka_2.10-0.9.1.jar | grep 
KafkaReceiver

 inflated: org/apache/spark/streaming/kafka/KafkaReceiver$$anonfun$1.class
 inflated: 
org/apache/spark/streaming/kafka/KafkaReceiver$MessageHandler$$anonfun$run$2.class

 inflated: org/apache/spark/streaming/kafka/KafkaReceiver.class
 inflated: 
org/apache/spark/streaming/kafka/KafkaReceiver$$anonfun$onStart$3.class
 inflated: 
org/apache/spark/streaming/kafka/KafkaReceiver$$anonfun$onStart$1.class
 inflated: 
org/apache/spark/streaming/kafka/KafkaReceiver$MessageHandler.class
 inflated: 
org/apache/spark/streaming/kafka/KafkaReceiver$$anonfun$tryZookeeperConsumerGroupCleanup$1.class
 inflated: 
org/apache/spark/streaming/kafka/KafkaReceiver$$anonfun$onStart$5.class
 inflated: 
org/apache/spark/streaming/kafka/KafkaReceiver$$anonfun$onStart$2.class
 inflated: 
org/apache/spark/streaming/kafka/KafkaReceiver$MessageHandler$$anonfun$run$1.class
 inflated: 
org/apache/spark/streaming/kafka/KafkaReceiver$$anonfun$onStart$4.class
 inflated: 
org/apache/spark/streaming/kafka/KafkaReceiver$$anonfun$onStart$5$$anonfun$apply$1.class

[root@dlvm1 libs]#

Any ideas?

--
Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)



Re: Announcing Spark 1.0.0

2014-05-30 Thread Margusja

Now I can download. Thanks.

Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)

On 30/05/14 13:48, Patrick Wendell wrote:

It is updated - try holding Shift + refresh in your browser, you are
probably caching the page.

On Fri, May 30, 2014 at 3:46 AM, prabeesh k prabsma...@gmail.com wrote:

Please update the http://spark.apache.org/docs/latest/  link


On Fri, May 30, 2014 at 4:03 PM, Margusja mar...@roo.ee wrote:

Is it possible to download pre build package?

http://mirror.symnds.com/software/Apache/incubator/spark/spark-1.0.0/spark-1.0.0-bin-hadoop2.tgz
- gives me 404

Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)


On 30/05/14 13:18, Christopher Nguyen wrote:

Awesome work, Pat et al.!

--
Christopher T. Nguyen
Co-founder  CEO, Adatao http://adatao.com
linkedin.com/in/ctnguyen http://linkedin.com/in/ctnguyen




On Fri, May 30, 2014 at 3:12 AM, Patrick Wendell pwend...@gmail.com
mailto:pwend...@gmail.com wrote:

 I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0
 is a milestone release as the first in the 1.0 line of releases,
 providing API stability for Spark's core interfaces.

 Spark 1.0.0 is Spark's largest release ever, with contributions from
 117 developers. I'd like to thank everyone involved in this release -
 it was truly a community effort with fixes, features, and
 optimizations contributed from dozens of organizations.

 This release expands Spark's standard libraries, introducing a new
SQL
 package (SparkSQL) which lets users integrate SQL queries into
 existing Spark workflows. MLlib, Spark's machine learning library, is
 expanded with sparse vector support and several new algorithms. The
 GraphX and Streaming libraries also introduce new features and
 optimizations. Spark's core engine adds support for secured YARN
 clusters, a unified tool for submitting Spark applications, and
 several performance and stability improvements. Finally, Spark adds
 support for Java 8 lambda syntax and improves coverage of the Java
and
 Python API's.

 Those features only scratch the surface - check out the release
 notes here:
 http://spark.apache.org/releases/spark-release-1-0-0.html

 Note that since release artifacts were posted recently, certain
 mirrors may not have working downloads for a few hours.

 - Patrick