Re: Spark and Kafka direct approach problem

2016-05-04 Thread أنس الليثي
NoSuchMethodError usually appears because of a difference in the library
versions.

Check the version of the libraries you downloaded, the version of spark,
the version of Kafka.

On 4 May 2016 at 16:18, Luca Ferrari  wrote:

> Hi,
>
> I’m new on Apache Spark and I’m trying to run the Spark Streaming + Kafka
> Integration Direct Approach example (JavaDirectKafkaWordCount.java).
>
> I’ve downloaded all the libraries but when I try to run I get this error
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
>
> at kafka.api.RequestKeys$.(RequestKeys.scala:48)
>
> at kafka.api.RequestKeys$.(RequestKeys.scala)
>
> at kafka.api.TopicMetadataRequest.(TopicMetadataRequest.scala:55)
>
> at
> org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:122)
>
> at
> org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
>
> at
> org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)
>
> at
> org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
>
> at
> org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607)
>
> at
> org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
> at it.unimi.di.luca.SimpleApp.main(SimpleApp.java:53)
>
> Any suggestions?
>
> Cheers
> Luca
>
>



-- 
Anas Rabei
Senior Software Developer
Mubasher.info
anas.ra...@mubasher.info


Re: Spark Streaming Job get killed after running for about 1 hour

2016-04-25 Thread أنس الليثي
I am using the latest Spark version 1.6

I have increased the maximum number of open files using this command *sysctl
-w fs.file-max=3275782*

Also I increased the limit for the user who run the spark job by updating
the /etc/security/limits.conf file. Soft limit is 1024 and Hard limit
is 65536.

The operating system is Red Hat Enterprise Linux Server release 6.6
(Santiago)


@Rodrick : I will try to increase the assigned memory and see

Best regards


On 24 April 2016 at 16:42, Ted Yu  wrote:

> Which version of Spark are you using ?
>
> How did you increase the open file limit ?
>
> Which operating system do you use ?
>
> Please see Example 6. ulimit Settings on Ubuntu under:
> http://hbase.apache.org/book.html#basic.prerequisites
>
> On Sun, Apr 24, 2016 at 2:34 AM, fanooos  wrote:
>
>> I have a spark streaming job that read tweets stream from gnip and write
>> it
>> to Kafak.
>>
>> Spark and kafka are running on the same cluster.
>>
>> My cluster consists of 5 nodes. Kafka-b01 ... Kafka-b05
>>
>> Spark master is running on Kafak-b05.
>>
>> Here is how we submit the spark job
>>
>> *nohup sh $SPZRK_HOME/bin/spark-submit --total-executor-cores 5 --class
>> org.css.java.gnipStreaming.GnipSparkStreamer --master
>> spark://kafka-b05:7077
>> GnipStreamContainer.jar powertrack
>> kafka-b01.css.org,kafka-b02.css.org,kafka-b03.css.org,kafka-b04.css.org,
>> kafka-b05.css.org
>> gnip_live_stream 2 &*
>>
>> After about 1 hour the spark job get killed
>>
>> The logs in the nohub file shows the following exception
>>
>> /org.apache.spark.storage.BlockFetchException: Failed to fetch block from
>> 2
>> locations. Most recent failure cause:
>> at
>>
>> org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:595)
>> at
>>
>> org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:585)
>> at
>>
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at
>> org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:585)
>> at
>> org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:570)
>> at
>> org.apache.spark.storage.BlockManager.get(BlockManager.scala:630)
>> at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:48)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: io.netty.channel.ChannelException: Unable to create Channel
>> from
>> class class io.netty.channel.socket.nio.NioSocketChannel
>> at
>>
>> io.netty.bootstrap.AbstractBootstrap$BootstrapChannelFactory.newChannel(AbstractBootstrap.java:455)
>> at
>>
>> io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:306)
>> at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:134)
>> at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:116)
>> at
>>
>> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:211)
>> at
>>
>> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
>> at
>>
>> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:90)
>> at
>>
>> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
>> at
>>
>> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
>> at
>>
>> org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:99)
>> at
>>
>> org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:89)
>> at
>>
>> org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:588)
>> ... 15 more
>> Caused by: io.netty.channel.ChannelException: Failed to open a socket.
>> at
>>
>> io.netty.channel.socket.nio.NioSocketChannel.newSocket(NioSocketChannel.java:62)
>> at
>>
>> io.netty.channel.socket.nio.NioSocketChannel.(NioSocketChannel.java:72)
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>> at
>>
>> 

Re: Sending events to Kafka from spark job

2016-03-30 Thread أنس الليثي
Dear Andy,

As far as I understand, the transformations are applied to the RDDs not to
the data and I need to send the actual data to Kafka. This way, I think I
should perform at least one action to make spark load the data.

Kindly correct me if I do not understand this the correct way.

Best regards.

On 29 March 2016 at 19:40, Andy Davidson 
wrote:

> Hi Fanoos
>
> I would be careful about using collect(). You need to make sure you local
> computer has enough memory to hold your entire data set.
>
> Eventually I will need to do something similar. I have to written the code
> yet. My plan is to load the data into a data frame and then write a UDF
> that actually publishes the Kafka
>
> If you are using RDD’s you could use map() or some other transform to
> cause the data to be published
>
> Andy
>
> From: fanooos 
> Date: Tuesday, March 29, 2016 at 4:26 AM
> To: "user @spark" 
> Subject: Re: Sending events to Kafka from spark job
>
> I think I find a solution but I have no idea how this affects the execution
> of the application.
>
> At the end of the script I added  a sleep statement.
>
> import time
> time.sleep(1)
>
>
> This solved the problem.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Sending-events-to-Kafka-from-spark-job-tp26622p26624.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>


-- 
Anas Rabei
Senior Software Developer
Mubasher.info
anas.ra...@mubasher.info


Re: Apache Spark data locality when integrating with Kafka

2016-02-07 Thread أنس الليثي
Diwakar

We have our own servers. We will not use any cloud service like Amazon's

On 7 February 2016 at 18:24, Diwakar Dhanuskodi <
diwakar.dhanusk...@gmail.com> wrote:

> Fanoos,
> Where  you  want the solution to  be deployed ?. On premise or cloud?
>
> Regards
> Diwakar .
>
>
>
> Sent from Samsung Mobile.
>
>
>  Original message 
> From: "Yuval.Itzchakov" 
> Date:07/02/2016 19:38 (GMT+05:30)
> To: user@spark.apache.org
> Cc:
> Subject: Re: Apache Spark data locality when integrating with Kafka
>
> I would definitely try to avoid hosting Kafka and Spark on the same
> servers.
>
> Kafka and Spark will be doing alot of IO between them, so you'll want to
> maximize on those resources and not share them on the same server. You'll
> want each Kafka broker to be on a dedicated server, as well as your spark
> master and workers. If you're hosting them on Amazon EC2 instances, then
> you'll want these to be on the same availability zone, so you can benefit
> from low latency in that same zone. If you're on a dedicated servers,
> perhaps you'll want to create a VPC between the two clusters so you can,
> again, benefit from low IO latency and high throughput.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-data-locality-when-integrating-with-Kafka-tp26165p26170.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Anas Rabei
Senior Software Developer
Mubasher.info
anas.ra...@mubasher.info


Re: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver

2015-11-09 Thread أنس الليثي
If I packaged the application and submit it, it works fine but I need to
run it from eclipse.

Is there any problem running the application from eclipse ?



On 9 November 2015 at 12:27, Tathagata Das <t...@databricks.com> wrote:

> How are you submitting the spark application?
> You are supposed to submit the fat-jar of the application that include the
> spark-streaming-twitter dependency (and its subdeps) but not
> spark-streaming and spark-core.
>
> On Mon, Nov 9, 2015 at 1:02 AM, أنس الليثي <dev.fano...@gmail.com> wrote:
>
>> I tried to remove maven and adding the dependencies manually using build
>> path > configure build path > add external jars, then adding the jars
>> manually but it did not work.
>>
>> I tried to create another project and copied the code from the first app
>> but the problem still the same.
>>
>> I event tried to change eclipse with another version, but the same
>> problem exist.
>>
>> :( :( :( :(
>>
>> On 9 November 2015 at 10:47, أنس الليثي <dev.fano...@gmail.com> wrote:
>>
>>> I tried both, the same exception still thrown
>>>
>>> On 9 November 2015 at 10:45, Sean Owen <so...@cloudera.com> wrote:
>>>
>>>> You included a very old version of the Twitter jar - 1.0.0. Did you
>>>> mean 1.5.1?
>>>>
>>>> On Mon, Nov 9, 2015 at 7:36 AM, fanooos <dev.fano...@gmail.com> wrote:
>>>> > This is my first Spark Stream application. The setup is as following
>>>> >
>>>> > 3 nodes running a spark cluster. One master node and two slaves.
>>>> >
>>>> > The application is a simple java application streaming from twitter
>>>> and
>>>> > dependencies managed by maven.
>>>> >
>>>> > Here is the code of the application
>>>> >
>>>> > public class SimpleApp {
>>>> >
>>>> > public static void main(String[] args) {
>>>> >
>>>> > SparkConf conf = new SparkConf().setAppName("Simple
>>>> > Application").setMaster("spark://rethink-node01:7077");
>>>> >
>>>> > JavaStreamingContext sc = new JavaStreamingContext(conf, new
>>>> > Duration(1000));
>>>> >
>>>> > ConfigurationBuilder cb = new ConfigurationBuilder();
>>>> >
>>>> > cb.setDebugEnabled(true).setOAuthConsumerKey("ConsumerKey")
>>>> > .setOAuthConsumerSecret("ConsumerSecret")
>>>> > .setOAuthAccessToken("AccessToken")
>>>> > .setOAuthAccessTokenSecret("TokenSecret");
>>>> >
>>>> > OAuthAuthorization auth = new OAuthAuthorization(cb.build());
>>>> >
>>>> > JavaDStream tweets = TwitterUtils.createStream(sc,
>>>> auth);
>>>> >
>>>> >  JavaDStream statuses = tweets.map(new
>>>> Function<Status,
>>>> > String>() {
>>>> >  public String call(Status status) throws Exception {
>>>> > return status.getText();
>>>> > }
>>>> > });
>>>> >
>>>> >  statuses.print();;
>>>> >
>>>> >  sc.start();
>>>> >
>>>> >  sc.awaitTermination();
>>>> >
>>>> > }
>>>> >
>>>> > }
>>>> >
>>>> >
>>>> > here is the pom file
>>>> >
>>>> > http://maven.apache.org/POM/4.0.0;
>>>> > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>>>> > xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
>>>> > http://maven.apache.org/xsd/maven-4.0.0.xsd;>
>>>> > 4.0.0
>>>> > SparkFirstTry
>>>> > SparkFirstTry
>>>> > 0.0.1-SNAPSHOT
>>>> >
>>>> > 
>>>> > 
>>>> > org.apache.spark
>>>> > spark-core_2.10
>>>> > 1.5.1
>>>> > provided
>>>> > 
>>>> >
>>>> > 
>>>> > org.apache.spark
>>>> > spark-streaming_2.10
>>>> >  

Re: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver

2015-11-09 Thread أنس الليثي
I tried to remove maven and adding the dependencies manually using build
path > configure build path > add external jars, then adding the jars
manually but it did not work.

I tried to create another project and copied the code from the first app
but the problem still the same.

I event tried to change eclipse with another version, but the same problem
exist.

:( :( :( :(

On 9 November 2015 at 10:47, أنس الليثي <dev.fano...@gmail.com> wrote:

> I tried both, the same exception still thrown
>
> On 9 November 2015 at 10:45, Sean Owen <so...@cloudera.com> wrote:
>
>> You included a very old version of the Twitter jar - 1.0.0. Did you mean
>> 1.5.1?
>>
>> On Mon, Nov 9, 2015 at 7:36 AM, fanooos <dev.fano...@gmail.com> wrote:
>> > This is my first Spark Stream application. The setup is as following
>> >
>> > 3 nodes running a spark cluster. One master node and two slaves.
>> >
>> > The application is a simple java application streaming from twitter and
>> > dependencies managed by maven.
>> >
>> > Here is the code of the application
>> >
>> > public class SimpleApp {
>> >
>> > public static void main(String[] args) {
>> >
>> > SparkConf conf = new SparkConf().setAppName("Simple
>> > Application").setMaster("spark://rethink-node01:7077");
>> >
>> > JavaStreamingContext sc = new JavaStreamingContext(conf, new
>> > Duration(1000));
>> >
>> > ConfigurationBuilder cb = new ConfigurationBuilder();
>> >
>> > cb.setDebugEnabled(true).setOAuthConsumerKey("ConsumerKey")
>> > .setOAuthConsumerSecret("ConsumerSecret")
>> > .setOAuthAccessToken("AccessToken")
>> > .setOAuthAccessTokenSecret("TokenSecret");
>> >
>> > OAuthAuthorization auth = new OAuthAuthorization(cb.build());
>> >
>> > JavaDStream tweets = TwitterUtils.createStream(sc,
>> auth);
>> >
>> >  JavaDStream statuses = tweets.map(new Function<Status,
>> > String>() {
>> >  public String call(Status status) throws Exception {
>> > return status.getText();
>> > }
>> > });
>> >
>> >  statuses.print();;
>> >
>> >  sc.start();
>> >
>> >  sc.awaitTermination();
>> >
>> > }
>> >
>> > }
>> >
>> >
>> > here is the pom file
>> >
>> > http://maven.apache.org/POM/4.0.0;
>> > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>> > xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
>> > http://maven.apache.org/xsd/maven-4.0.0.xsd;>
>> > 4.0.0
>> > SparkFirstTry
>> > SparkFirstTry
>> > 0.0.1-SNAPSHOT
>> >
>> > 
>> > 
>> > org.apache.spark
>> > spark-core_2.10
>> > 1.5.1
>> > provided
>> > 
>> >
>> > 
>> > org.apache.spark
>> > spark-streaming_2.10
>> > 1.5.1
>> > provided
>> > 
>> >
>> > 
>> > org.twitter4j
>> > twitter4j-stream
>> > 3.0.3
>> > 
>> > 
>> > org.apache.spark
>> > spark-streaming-twitter_2.10
>> > 1.0.0
>> > 
>> >
>> > 
>> >
>> > 
>> > src
>> > 
>> > 
>> > maven-compiler-plugin
>> > 3.3
>> > 
>> > 1.8
>> > 1.8
>> > 
>> > 
>> > 
>> > maven-assembly-plugin
>> > 
>> > 
>> > 
>> >
>> > com.test.sparkTest.SimpleApp
>> > 
>> > 
>> > 
>> >
>>  jar-with-dependencies
>> > 
>> > 
>> > 
>> >
>> > 
>> >

Spark Slave always fails to connect to master

2015-11-05 Thread أنس الليثي
I am trying to install a standalone spark cluster. I prepared 3 virtual
machines each with Ubuntu installed.

The three machines consists a cluster with one master and two slaves.

I followed the steps in the documentation of Apache spark. I started the
master script from the master node and it worked fine.

The problem happens with the slaves.

I tried to start the slave once using the sbin/start-slave.sh from each
machine and another time using sbin/start-slaves.sh from the master node.

The workers on the slaves nodes fails to start and throws the following
exception

15/11/06 02:12:36 WARN Worker: Failed to connect to master rethink-node01:7077
akka.actor.ActorNotFound: Actor not found for:
ActorSelection[Anchor(akka.tcp://sparkMaster@rethink-node01:7077/),
Path(/user/Master)]
at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at 
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at 
akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120)
at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266)
at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:533)
at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:569)
at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:559)
at 
akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
at akka.remote.EndpointWriter.postStop(Endpoint.scala:557)
at akka.actor.Actor$class.aroundPostStop(Actor.scala:477)
at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:411)
at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
at 
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
at akka.actor.ActorCell.terminate(ActorCell.scala:369)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



What is the problem ?


Best regards


Re: Connection PHP application to Spark Sql thrift server

2015-03-07 Thread أنس الليثي
Sorry for late reply.

I have tried to connect to the hive server instead of the spark sql but the
same exception is thrown in the hive server logs.

The only difference is the hive log has a little more information than the
spark sql logs. The hive server logs has this message TTransportException
: invalid status -128.

This is the only difference between the two logs.

Thanks

On 5 March 2015 at 16:15, Cheng, Hao hao.ch...@intel.com wrote:

 Can you query upon Hive? Let's confirm if it's a bug of SparkSQL in your
 PHP code first.

 -Original Message-
 From: fanooos [mailto:dev.fano...@gmail.com]
 Sent: Thursday, March 5, 2015 4:57 PM
 To: user@spark.apache.org
 Subject: Connection PHP application to Spark Sql thrift server

 We have two applications need to connect to Spark Sql thrift server.

 The first application is developed in java. Having spark sql thrift server
 running, we following the steps in  this link 
 https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC
 
 and the application connected smoothly without any problem.


 The second application is developed in PHP. We followed the steps provided
 in  this link 
 https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-PHP
 
 .  When hitting the php script, the spark sql thrift server throws this
 exception.

 15/03/05 11:53:19 ERROR TThreadPoolServer: Error occurred during
 processing of message.
 java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
 at

 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
 at

 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.apache.thrift.transport.TTransportException
 at

 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 at
 org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at

 org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
 at

 org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
 at
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
 at

 org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
 at

 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
 ... 4 more


 I searched a lot about this exception but I can not figure out what is the
 problem.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Connection-PHP-application-to-Spark-Sql-thrift-server-tp21925.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
 commands, e-mail: user-h...@spark.apache.org




-- 
Anas Rabei
Senior Software Developer
Mubasher.info
anas.ra...@mubasher.info


Re: Connecting a PHP/Java applications to Spark SQL Thrift Server

2015-03-04 Thread أنس الليثي
Thanks very much, I used it and works fine with me.



On 4 March 2015 at 11:56, Arush Kharbanda ar...@sigmoidanalytics.com
wrote:

 For java You can use hive-jdbc connectivity jars to connect to Spark-SQL.

 The driver is inside the hive-jdbc Jar.

 *http://hive.apache.org/javadocs/r0.11.0/api/org/apache/hadoop/hive/jdbc/HiveDriver.html
 http://hive.apache.org/javadocs/r0.11.0/api/org/apache/hadoop/hive/jdbc/HiveDriver.html*




 On Wed, Mar 4, 2015 at 1:26 PM, n...@reactor8.com wrote:

 SparkSQL supports JDBC/ODBC connectivity, so if that's the route you
 needed/wanted to connect through you could do so via java/php apps.
 Havent
 used either so cant speak to the developer experience, assume its pretty
 good as would be preferred method for lots of third party enterprise
 apps/tooling

 If you prefer using the thrift server/interface, if they don't exist
 already
 in open source land you can use thrift definitions to generate client libs
 in any supported thrift language and use that for connectivity.  Seems one
 issue with thrift-server is when running in cluster mode.  Seems like it
 still exists but UX of error has been cleaned up in 1.3:

 https://issues.apache.org/jira/browse/SPARK-5176



 -Original Message-
 From: fanooos [mailto:dev.fano...@gmail.com]
 Sent: Tuesday, March 3, 2015 11:15 PM
 To: user@spark.apache.org
 Subject: Connecting a PHP/Java applications to Spark SQL Thrift Server

 We have installed hadoop cluster with hive and spark and the spark sql
 thrift server is up and running without any problem.

 Now we have set of applications need to use spark sql thrift server to
 query
 some data.

 Some of these applications are java applications and the others are PHP
 applications.

 As I am an old fashioned java developer, I used to connect java
 applications
 to BD servers like Mysql using a JDBC driver. Is there a corresponding
 driver for connecting with Spark Sql Thrift server ? Or what is the
 library
 I need to use to connect to it?


 For PHP, what are the ways we can use to connect PHP applications to Spark
 Sql Thrift Server?





 --
 View this message in context:

 http://apache-spark-user-list.1001560.n3.nabble.com/Connecting-a-PHP-Java-ap
 plications-to-Spark-SQL-Thrift-Server-tp21902.html
 http://apache-spark-user-list.1001560.n3.nabble.com/Connecting-a-PHP-Java-applications-to-Spark-SQL-Thrift-Server-tp21902.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
 commands, e-mail: user-h...@spark.apache.org



 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 --

 [image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com

 *Arush Kharbanda* || Technical Teamlead

 ar...@sigmoidanalytics.com || www.sigmoidanalytics.com




-- 
Anas Rabei
Senior Software Developer
Mubasher.info
anas.ra...@mubasher.info


Re: InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag

2015-02-23 Thread أنس الليثي
Hadoop version : 2.6.0
Spark Version : 1.2.1

here is also the pom.xml
project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi=
http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=
http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd;
  modelVersion4.0.0/modelVersion
  groupIdTestSpark/groupId
  artifactIdTestSpark/artifactId
  version0.0.1-SNAPSHOT/version
  dependencies
  dependency
  groupIdorg.apache.spark/groupId
  artifactIdspark-core_2.10/artifactId
  version1.2.1/version
  /dependency
  /dependencies
  build
sourceDirectorysrc/sourceDirectory
plugins
  plugin
artifactIdmaven-compiler-plugin/artifactId
version3.1/version
configuration
  source1.8/source
  target1.8/target
/configuration
  /plugin
/plugins
  /build
/project

Best regards

On 24 February 2015 at 08:43, Ted Yu yuzhih...@gmail.com wrote:

 bq. have installed hadoop on a local virtual machine

 Can you tell us the release of hadoop you installed ?

 What Spark release are you using ? Or be more specific, what hadoop
 release was the Spark built against ?

 Cheers

 On Mon, Feb 23, 2015 at 9:37 PM, fanooos dev.fano...@gmail.com wrote:

 Hi

 I have installed hadoop on a local virtual machine using the steps from
 this
 URL


 https://www.digitalocean.com/community/tutorials/how-to-install-hadoop-on-ubuntu-13-10

 In the local machine I write a little Spark application in java to read a
 file from the hadoop instance installed in the virtual machine.

 The code is below

 public static void main(String[] args) {

 JavaSparkContext sc = new JavaSparkContext(new
 SparkConf().setAppName(Spark Count).setMaster(local));

 JavaRDDString lines =
 sc.textFile(hdfs://10.62.57.141:50070/tmp/lines.txt);
 JavaRDDInteger lengths = lines.flatMap(new FlatMapFunctionString,
 Integer() {
 @Override
 public IterableInteger call(String t) throws Exception {
 return Arrays.asList(t.length());
 }
 });
 ListInteger collect = lengths.collect();
 int totalLength = lengths.reduce(new Function2Integer, Integer,
 Integer() {
 @Override
 public Integer call(Integer v1, Integer v2) throws
 Exception {
 return v1+v2;
 }
 });
 System.out.println(totalLength);

   }


 The application throws this exception

 Exception in thread main java.io.IOException: Failed on local
 exception: com.google.protobuf.InvalidProtocolBufferException: Protocol
 message end-group tag did not match expected tag.; Host Details : local
 host
 is: TOSHIBA-PC/192.168.56.1; destination host is: 10.62.57.141:50070;
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
 at org.apache.hadoop.ipc.Client.call(Client.java:1351)
 at org.apache.hadoop.ipc.Client.call(Client.java:1300)
 at

 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at

 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
 at

 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
 at

 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
 at
 org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
 at

 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
 at

 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
 at

 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at

 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
 at
 org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1701)
 at
 org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1647)
 at

 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:222)
 at

 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
 at
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222)
 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220)
 at scala.Option.getOrElse(Option.scala:120)
 at