Spark Job Failed with FileNotFoundException

2016-11-01 Thread fanooos
I have a spark cluster consists of 5 nodes and I have a spark job that should process some files from a directory and send its content to Kafka. I am trying to submit the job using the following command bin$ ./spark-submit --total-executor-cores 20 --executor-memory 5G --class

Re: Can not import KafkaProducer in spark streaming job

2016-05-02 Thread fanooos
I could solve the issue but the solution is very weird. I run this command cat old_script.py > new_script.py then I submitted the job using the new script. This is the second time I face such issue with python script and I have no explanation to what happened. I hope this trick help someone

Can not import KafkaProducer in spark streaming job

2016-05-01 Thread fanooos
I have a very strange problem. I wrote a spark streaming job that monitor an HDFS directory, read the newly added files, and send the contents to Kafka. The job is written in python and you can got the code from this link http://pastebin.com/mpKkMkph When submitting the job I got that error

Spark Streaming Job get killed after running for about 1 hour

2016-04-24 Thread fanooos
I have a spark streaming job that read tweets stream from gnip and write it to Kafak. Spark and kafka are running on the same cluster. My cluster consists of 5 nodes. Kafka-b01 ... Kafka-b05 Spark master is running on Kafak-b05. Here is how we submit the spark job *nohup sh

Re: Sending events to Kafka from spark job

2016-03-29 Thread fanooos
I think I find a solution but I have no idea how this affects the execution of the application. At the end of the script I added a sleep statement. import time time.sleep(1) This solved the problem. -- View this message in context:

Sending events to Kafka from spark job

2016-03-29 Thread fanooos
rap_servers="10.62.54.111:9092") tweets = sc.textFile("/home/fanooos/Desktop/historical_scripts/output/1/activities_201603270430_201603270440.json") tweetsCollection = tweets.collect() for tweet in tweetsCollection: producer.send('testTopic', value=bytes(twe

Apache Spark data locality when integrating with Kafka

2016-02-06 Thread fanooos
Dears If I will use Kafka as a streaming source to some spark jobs, is it advised to install spark to the same nodes of kafka cluster? What are the benefits and drawbacks of such a decision? regards -- View this message in context:

java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver

2015-11-08 Thread fanooos
This is my first Spark Stream application. The setup is as following 3 nodes running a spark cluster. One master node and two slaves. The application is a simple java application streaming from twitter and dependencies managed by maven. Here is the code of the application public class

Spark sql thrift server slower than hive

2015-03-22 Thread fanooos
We have cloudera CDH 5.3 installed on one machine. We are trying to use spark sql thrift server to execute some analysis queries against hive table. Without any changes in the configurations, we run the following query on both hive and spark sql thrift server *select * from tableName;* The

org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException

2015-03-17 Thread fanooos
I have a hadoop cluster and I need to query the data stored on the HDFS using spark sql thrift server. Spark sql thrift server is up and running. It is configured to read from HIVE table. The hive table is an external table that corresponding to set of files stored on HDFS. These files contains

Is there any problem in having a long opened connection to spark sql thrift server

2015-03-09 Thread fanooos
I have some applications developed using PHP and currently we have a problem in connecting these applications to spark sql thrift server. ( Here is the problem I am talking about. http://apache-spark-user-list.1001560.n3.nabble.com/Connection-PHP-application-to-Spark-Sql-thrift-server-td21925.html

Connection PHP application to Spark Sql thrift server

2015-03-05 Thread fanooos
We have two applications need to connect to Spark Sql thrift server. The first application is developed in java. Having spark sql thrift server running, we following the steps in this link https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC and the

Connecting a PHP/Java applications to Spark SQL Thrift Server

2015-03-03 Thread fanooos
We have installed hadoop cluster with hive and spark and the spark sql thrift server is up and running without any problem. Now we have set of applications need to use spark sql thrift server to query some data. Some of these applications are java applications and the others are PHP

Spark SQL Thrift Server start exception : java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2015-03-02 Thread fanooos
I have installed a hadoop cluster (version : 2.6.0), apache spark (version : 1.2.1 preBuilt for hadoop 2.4 and later), and hive (version 1.0.0). When I try to start the spark sql thrift server I am getting the following exception. Exception in thread main java.lang.RuntimeException:

InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag

2015-02-23 Thread fanooos
Hi I have installed hadoop on a local virtual machine using the steps from this URL https://www.digitalocean.com/community/tutorials/how-to-install-hadoop-on-ubuntu-13-10 In the local machine I write a little Spark application in java to read a file from the hadoop instance installed in the