SparkContext not creating due Logger initialization

2016-12-08 Thread Adnan Ahmed
Hi,


Sometimes I get this error when I submit spark job. Its not like every time but 
when it comes up SparkContext doesn't get created.


16/12/08 08:02:18 INFO [akka.event.slf4j.Slf4jLogger] 80==> Slf4jLogger started
error while starting up loggers
akka.ConfigurationException: Logger specified in config can't be loaded 
[akka.event.slf4j.Slf4jLogger] due to 
[akka.event.Logging$LoggerInitializationException: Logger log1-Slf4jLogger did 
not respond with LoggerInitialized, sent instead [TIMEOUT]]
at 
akka.event.LoggingBus$$anonfun$4$$anonfun$apply$1.applyOrElse(Logging.scala:116)
at 
akka.event.LoggingBus$$anonfun$4$$anonfun$apply$1.applyOrElse(Logging.scala:115)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at akka.event.LoggingBus$$anonfun$4.apply(Logging.scala:115)
at akka.event.LoggingBus$$anonfun$4.apply(Logging.scala:110)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$map$2.apply(TraversableLike.scala:722)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:721)
at akka.event.LoggingBus$class.startDefaultLoggers(Logging.scala:110)
at akka.event.EventStream.startDefaultLoggers(EventStream.scala:26)
at akka.actor.LocalActorRefProvider.init(ActorRefProvider.scala:623)
at 
akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:157)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:119)
at 
org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52)
at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1988)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1979)
at 
org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
at org.apache.spark.SparkContext.(SparkContext.scala:457)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2304)


Thanks

Adnan Ahmed


Re: Storing Compressed data in HDFS into Spark

2015-10-22 Thread Adnan Haider
I believe spark.rdd.compress requires the data to be serialized. In my
case, I have data already compressed which becomes decompressed as I try to
cache it. I believe even when I set spark.rdd.compress to *true, *Spark
will still decompress the data and then serialize it and then compress the
serialized data.

Although Parquet is an option, I believe it will only make sense to use it
when running Spark SQL. However, if I am using graphx or mllib will it
help?

Thanks, Adnan Haider
B.S Candidate, Computer Science
Illinois Institute of Technology

On Thu, Oct 22, 2015 at 7:15 AM, Igor Berman <igor.ber...@gmail.com> wrote:

> check spark.rdd.compress
>
> On 19 October 2015 at 21:13, ahaider3 <ahaid...@hawk.iit.edu> wrote:
>
>> Hi,
>> A lot of the data I have in HDFS is compressed. I noticed when I load this
>> data into spark and cache it, Spark unrolls the data like normal but
>> stores
>> the data uncompressed in memory. For example, suppose /data/ is an RDD
>> with
>> compressed partitions on HDFS. I then cache the data. When I call
>> /data.count()/, the data is rightly decompressed since it needs to find
>> the
>> value of /.count()/. But, the data that is cached is also decompressed.
>> Can
>> a partition be compressed in spark? I know spark allows for data to be
>> compressed, after serialization. But what if, I only want the partitions
>> compressed.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Storing-Compressed-data-in-HDFS-into-Spark-tp25123.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: how to set spark.executor.memory and heap size

2014-04-24 Thread Adnan Yaqoob
You need to use proper url format:

file://home/wxhsdp/spark/example/standalone/README.md


On Thu, Apr 24, 2014 at 1:29 PM, wxhsdp wxh...@gmail.com wrote:

 i think maybe it's the problem of read local file

 val logFile = /home/wxhsdp/spark/example/standalone/README.md
 val logData = sc.textFile(logFile).cache()

 if i replace the above code with

 val logData = sc.parallelize(Array(1,2,3,4)).cache()

 the job can complete successfully

 can't i read a file located at local file system? anyone knows the reason?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/how-to-set-spark-executor-memory-and-heap-size-tp4719p4740.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: how to set spark.executor.memory and heap size

2014-04-24 Thread Adnan Yaqoob
Sorry wrong format:

file:///home/wxhsdp/spark/example/standalone/README.md

An extra / is needed at the start.


On Thu, Apr 24, 2014 at 1:46 PM, Adnan Yaqoob nsyaq...@gmail.com wrote:

 You need to use proper url format:

 file://home/wxhsdp/spark/example/standalone/README.md


 On Thu, Apr 24, 2014 at 1:29 PM, wxhsdp wxh...@gmail.com wrote:

 i think maybe it's the problem of read local file

 val logFile = /home/wxhsdp/spark/example/standalone/README.md
 val logData = sc.textFile(logFile).cache()

 if i replace the above code with

 val logData = sc.parallelize(Array(1,2,3,4)).cache()

 the job can complete successfully

 can't i read a file located at local file system? anyone knows the reason?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/how-to-set-spark-executor-memory-and-heap-size-tp4719p4740.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.





Re: SparkPi performance-3 cluster standalone mode

2014-04-24 Thread Adnan
Hi,

Relatively new on spark and have tried running SparkPi example on a
standalone 12 core three machine cluster. What I'm failing to understand is,
that running this example with a single slice gives better performance as
compared to using 12 slices. Same was the case when I was using parallelize
function. The time is scaling almost linearly with adding each slice. Please
let me know if I'm doing anything wrong. The code snippet is given below:

 

Regards,

Ahsan Ijaz



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkPi-performance-3-cluster-standalone-mode-tp4530p4751.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: how to set spark.executor.memory and heap size

2014-04-23 Thread Adnan Yaqoob
When I was testing spark, I faced this issue, this issue is not related to
memory shortage, It is because your configurations are not correct. Try to
pass you current Jar to to the SparkContext with SparkConf's setJars
function and try again.

On Thu, Apr 24, 2014 at 8:38 AM, wxhsdp wxh...@gmail.com wrote:

 by the way, codes run ok in spark shell



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/how-to-set-spark-executor-memory-and-heap-size-tp4719p4720.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Access Last Element of RDD

2014-04-23 Thread Adnan Yaqoob
You can use following code:

RDD.take(RDD.count())


On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna ansaiprasa...@gmail.comwrote:

 Hi All, Some help !
 RDD.first or RDD.take(1) gives the first item, is there a straight forward
 way to access the last element in a similar way ?

 I coudnt fine a tail/last method for RDD. !!



Re: Access Last Element of RDD

2014-04-23 Thread Adnan Yaqoob
This function will return scala List, you can use List's last function to
get the last element.

For example:

RDD.take(RDD.count()).last


On Thu, Apr 24, 2014 at 10:28 AM, Sai Prasanna ansaiprasa...@gmail.comwrote:

 Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.

 I want only to access the last element.


 On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna ansaiprasa...@gmail.comwrote:

 Oh ya, Thanks Adnan.


 On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob nsyaq...@gmail.comwrote:

 You can use following code:

 RDD.take(RDD.count())


 On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna 
 ansaiprasa...@gmail.comwrote:

 Hi All, Some help !
 RDD.first or RDD.take(1) gives the first item, is there a straight
 forward way to access the last element in a similar way ?

 I coudnt fine a tail/last method for RDD. !!







Re: Executing spark jobs with predefined Hadoop user

2014-04-10 Thread Adnan
You need to use proper HDFS URI with saveAsTextFile.

For Example:

rdd.saveAsTextFile(hdfs://NameNode:Port/tmp/Iris/output.tmp)

Regards,
Adnan


Asaf Lahav wrote
 Hi,
 
 We are using Spark with data files on HDFS. The files are stored as files
 for predefined hadoop user (hdfs).
 
 The folder is permitted with
 
 · read write, executable and read permission for the hdfs user
 
 · executable and read permission for users in the group
 
 · just read permission for all other users
 
 
 
 now the Spark write operation fails, due to a user mismatch of the spark
 context and the Hadoop user permission.
 
 Is there a way to start the Spark Context with another user than the one
 configured on the local machine?
 
 
 
 
 
 
 
 Please the technical details below:
 
 
 
 
 
 
 
 
 
 
 
 The permission on the hdfs folder /tmp/Iris is as follows:
 
 drwxr-xr-x   - hdfs  hadoop  0 2014-04-10 14:12 /tmp/Iris
 
 
 
 
 
 The Spark context is initiated on my local machine and according to the
 configured hdfs permission rwxr-xr-x there is no problem in loading the
 Hadoop hdfs file into a rdd:
 
 final JavaRDD
 String
  rdd = sparkContext.textFile(filePath);
 
 
 
 But saving the resulted rdd back to Hadoop resulst in an Hadoop security
 exception:
 
 rdd.saveAsTextFile(/tmp/Iris/output);
 
 
 
 Then the I receive the following Hadoop security exception:
 
 org.apache.hadoop.security.AccessControlException:
 org.apache.hadoop.security.AccessControlException: *Permission denied:
 user=halbani, access=WRITE, inode=/tmp/Iris:hdfs:hadoop:drwxr-xr-x*
 
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
 
   at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 
   at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 
   at
 java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 
   at
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
 
   at
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
 
   at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1428)
 
   at
 org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:332)
 
   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)
 
   at
 org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:52)
 
   at
 org.apache.hadoop.mapred.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:65)
 
   at
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:713)
 
   at
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:686)
 
   at
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:572)
 
   at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:894)
 
   at
 org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike.scala:355)
 
   at
 org.apache.spark.api.java.JavaRDD.saveAsTextFile(JavaRDD.scala:27)
 
   at
 org.apache.spark.reader.FileSpliter.split(FileSpliter.java:73)
 
   at
 org.apache.spark.reader.FileReaderMain.main(FileReaderMain.java:17)
 
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
   at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 
   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
   at java.lang.reflect.Method.invoke(Method.java:601)
 
   at
 com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
 
 Caused by: org.apache.hadoop.ipc.RemoteException:
 org.apache.hadoop.security.AccessControlException: Permission denied:
 user=halbani, access=WRITE, inode=/tmp/Iris:hdfs:hadoop:drwxr-xr-x
 
   at
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:225)
 
   at
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
 
   at
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:151)
 
   at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5951)
 
   at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5924)
 
   at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2628)
 
   at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2593)
 
   at
 org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:927)
 
   at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source)
 
   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke

Re: Executing spark jobs with predefined Hadoop user

2014-04-10 Thread Adnan
Then problem is not on spark side, you have three options, choose any one of
them:

1. Change permissions on /tmp/Iris folder from shell on NameNode with hdfs
dfs -chmod command.
2. Run your hadoop service with hdfs user.
3. Disable dfs.permissions in conf/hdfs-site.xml.

Regards,
Adnan


avito wrote
 Thanks Adam for the quick answer. You are absolutely right. 
 We are indeed using the entire HDFS URI. Just for the post I have removed
 the name node details.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Executing-spark-jobs-with-predefined-Hadoop-user-tp4059p4063.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


How to execute a function from class in distributed jar on each worker node?

2014-04-08 Thread Adnan
Hello,

I am running Cloudera 4 node cluster with 1 Master and 3 Slaves. I am
connecting with Spark Master from scala using SparkContext. I am trying to
execute a simple java function from the distributed jar on every Spark
Worker but haven't found a way to communicate with each worker or a Spark
API function to do it.

Can somebody help me with it or point me in the right direction?

Regards,
Adnan



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-execute-a-function-from-class-in-distributed-jar-on-each-worker-node-tp3870.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.