Re: Spark and Kafka direct approach problem
NoSuchMethodError usually appears because of a difference in the library versions. Check the version of the libraries you downloaded, the version of spark, the version of Kafka. On 4 May 2016 at 16:18, Luca Ferrariwrote: > Hi, > > I’m new on Apache Spark and I’m trying to run the Spark Streaming + Kafka > Integration Direct Approach example (JavaDirectKafkaWordCount.java). > > I’ve downloaded all the libraries but when I try to run I get this error > > Exception in thread "main" java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > > at kafka.api.RequestKeys$.(RequestKeys.scala:48) > > at kafka.api.RequestKeys$.(RequestKeys.scala) > > at kafka.api.TopicMetadataRequest.(TopicMetadataRequest.scala:55) > > at > org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:122) > > at > org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112) > > at > org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211) > > at > org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484) > > at > org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607) > > at > org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala) > at it.unimi.di.luca.SimpleApp.main(SimpleApp.java:53) > > Any suggestions? > > Cheers > Luca > > -- Anas Rabei Senior Software Developer Mubasher.info anas.ra...@mubasher.info
Re: Spark Streaming Job get killed after running for about 1 hour
I am using the latest Spark version 1.6 I have increased the maximum number of open files using this command *sysctl -w fs.file-max=3275782* Also I increased the limit for the user who run the spark job by updating the /etc/security/limits.conf file. Soft limit is 1024 and Hard limit is 65536. The operating system is Red Hat Enterprise Linux Server release 6.6 (Santiago) @Rodrick : I will try to increase the assigned memory and see Best regards On 24 April 2016 at 16:42, Ted Yuwrote: > Which version of Spark are you using ? > > How did you increase the open file limit ? > > Which operating system do you use ? > > Please see Example 6. ulimit Settings on Ubuntu under: > http://hbase.apache.org/book.html#basic.prerequisites > > On Sun, Apr 24, 2016 at 2:34 AM, fanooos wrote: > >> I have a spark streaming job that read tweets stream from gnip and write >> it >> to Kafak. >> >> Spark and kafka are running on the same cluster. >> >> My cluster consists of 5 nodes. Kafka-b01 ... Kafka-b05 >> >> Spark master is running on Kafak-b05. >> >> Here is how we submit the spark job >> >> *nohup sh $SPZRK_HOME/bin/spark-submit --total-executor-cores 5 --class >> org.css.java.gnipStreaming.GnipSparkStreamer --master >> spark://kafka-b05:7077 >> GnipStreamContainer.jar powertrack >> kafka-b01.css.org,kafka-b02.css.org,kafka-b03.css.org,kafka-b04.css.org, >> kafka-b05.css.org >> gnip_live_stream 2 &* >> >> After about 1 hour the spark job get killed >> >> The logs in the nohub file shows the following exception >> >> /org.apache.spark.storage.BlockFetchException: Failed to fetch block from >> 2 >> locations. Most recent failure cause: >> at >> >> org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:595) >> at >> >> org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:585) >> at >> >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:585) >> at >> org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:570) >> at >> org.apache.spark.storage.BlockManager.get(BlockManager.scala:630) >> at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:48) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >> at >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: io.netty.channel.ChannelException: Unable to create Channel >> from >> class class io.netty.channel.socket.nio.NioSocketChannel >> at >> >> io.netty.bootstrap.AbstractBootstrap$BootstrapChannelFactory.newChannel(AbstractBootstrap.java:455) >> at >> >> io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:306) >> at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:134) >> at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:116) >> at >> >> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:211) >> at >> >> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167) >> at >> >> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:90) >> at >> >> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) >> at >> >> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) >> at >> >> org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:99) >> at >> >> org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:89) >> at >> >> org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:588) >> ... 15 more >> Caused by: io.netty.channel.ChannelException: Failed to open a socket. >> at >> >> io.netty.channel.socket.nio.NioSocketChannel.newSocket(NioSocketChannel.java:62) >> at >> >> io.netty.channel.socket.nio.NioSocketChannel.(NioSocketChannel.java:72) >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >> Method) >> at >> >>
Re: Sending events to Kafka from spark job
Dear Andy, As far as I understand, the transformations are applied to the RDDs not to the data and I need to send the actual data to Kafka. This way, I think I should perform at least one action to make spark load the data. Kindly correct me if I do not understand this the correct way. Best regards. On 29 March 2016 at 19:40, Andy Davidsonwrote: > Hi Fanoos > > I would be careful about using collect(). You need to make sure you local > computer has enough memory to hold your entire data set. > > Eventually I will need to do something similar. I have to written the code > yet. My plan is to load the data into a data frame and then write a UDF > that actually publishes the Kafka > > If you are using RDD’s you could use map() or some other transform to > cause the data to be published > > Andy > > From: fanooos > Date: Tuesday, March 29, 2016 at 4:26 AM > To: "user @spark" > Subject: Re: Sending events to Kafka from spark job > > I think I find a solution but I have no idea how this affects the execution > of the application. > > At the end of the script I added a sleep statement. > > import time > time.sleep(1) > > > This solved the problem. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Sending-events-to-Kafka-from-spark-job-tp26622p26624.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > -- Anas Rabei Senior Software Developer Mubasher.info anas.ra...@mubasher.info
Re: Apache Spark data locality when integrating with Kafka
Diwakar We have our own servers. We will not use any cloud service like Amazon's On 7 February 2016 at 18:24, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote: > Fanoos, > Where you want the solution to be deployed ?. On premise or cloud? > > Regards > Diwakar . > > > > Sent from Samsung Mobile. > > > Original message > From: "Yuval.Itzchakov"> Date:07/02/2016 19:38 (GMT+05:30) > To: user@spark.apache.org > Cc: > Subject: Re: Apache Spark data locality when integrating with Kafka > > I would definitely try to avoid hosting Kafka and Spark on the same > servers. > > Kafka and Spark will be doing alot of IO between them, so you'll want to > maximize on those resources and not share them on the same server. You'll > want each Kafka broker to be on a dedicated server, as well as your spark > master and workers. If you're hosting them on Amazon EC2 instances, then > you'll want these to be on the same availability zone, so you can benefit > from low latency in that same zone. If you're on a dedicated servers, > perhaps you'll want to create a VPC between the two clusters so you can, > again, benefit from low IO latency and high throughput. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-data-locality-when-integrating-with-Kafka-tp26165p26170.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Anas Rabei Senior Software Developer Mubasher.info anas.ra...@mubasher.info
Re: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver
If I packaged the application and submit it, it works fine but I need to run it from eclipse. Is there any problem running the application from eclipse ? On 9 November 2015 at 12:27, Tathagata Das <t...@databricks.com> wrote: > How are you submitting the spark application? > You are supposed to submit the fat-jar of the application that include the > spark-streaming-twitter dependency (and its subdeps) but not > spark-streaming and spark-core. > > On Mon, Nov 9, 2015 at 1:02 AM, أنس الليثي <dev.fano...@gmail.com> wrote: > >> I tried to remove maven and adding the dependencies manually using build >> path > configure build path > add external jars, then adding the jars >> manually but it did not work. >> >> I tried to create another project and copied the code from the first app >> but the problem still the same. >> >> I event tried to change eclipse with another version, but the same >> problem exist. >> >> :( :( :( :( >> >> On 9 November 2015 at 10:47, أنس الليثي <dev.fano...@gmail.com> wrote: >> >>> I tried both, the same exception still thrown >>> >>> On 9 November 2015 at 10:45, Sean Owen <so...@cloudera.com> wrote: >>> >>>> You included a very old version of the Twitter jar - 1.0.0. Did you >>>> mean 1.5.1? >>>> >>>> On Mon, Nov 9, 2015 at 7:36 AM, fanooos <dev.fano...@gmail.com> wrote: >>>> > This is my first Spark Stream application. The setup is as following >>>> > >>>> > 3 nodes running a spark cluster. One master node and two slaves. >>>> > >>>> > The application is a simple java application streaming from twitter >>>> and >>>> > dependencies managed by maven. >>>> > >>>> > Here is the code of the application >>>> > >>>> > public class SimpleApp { >>>> > >>>> > public static void main(String[] args) { >>>> > >>>> > SparkConf conf = new SparkConf().setAppName("Simple >>>> > Application").setMaster("spark://rethink-node01:7077"); >>>> > >>>> > JavaStreamingContext sc = new JavaStreamingContext(conf, new >>>> > Duration(1000)); >>>> > >>>> > ConfigurationBuilder cb = new ConfigurationBuilder(); >>>> > >>>> > cb.setDebugEnabled(true).setOAuthConsumerKey("ConsumerKey") >>>> > .setOAuthConsumerSecret("ConsumerSecret") >>>> > .setOAuthAccessToken("AccessToken") >>>> > .setOAuthAccessTokenSecret("TokenSecret"); >>>> > >>>> > OAuthAuthorization auth = new OAuthAuthorization(cb.build()); >>>> > >>>> > JavaDStream tweets = TwitterUtils.createStream(sc, >>>> auth); >>>> > >>>> > JavaDStream statuses = tweets.map(new >>>> Function<Status, >>>> > String>() { >>>> > public String call(Status status) throws Exception { >>>> > return status.getText(); >>>> > } >>>> > }); >>>> > >>>> > statuses.print();; >>>> > >>>> > sc.start(); >>>> > >>>> > sc.awaitTermination(); >>>> > >>>> > } >>>> > >>>> > } >>>> > >>>> > >>>> > here is the pom file >>>> > >>>> > http://maven.apache.org/POM/4.0.0; >>>> > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; >>>> > xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 >>>> > http://maven.apache.org/xsd/maven-4.0.0.xsd;> >>>> > 4.0.0 >>>> > SparkFirstTry >>>> > SparkFirstTry >>>> > 0.0.1-SNAPSHOT >>>> > >>>> > >>>> > >>>> > org.apache.spark >>>> > spark-core_2.10 >>>> > 1.5.1 >>>> > provided >>>> > >>>> > >>>> > >>>> > org.apache.spark >>>> > spark-streaming_2.10 >>>> >
Re: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver
I tried to remove maven and adding the dependencies manually using build path > configure build path > add external jars, then adding the jars manually but it did not work. I tried to create another project and copied the code from the first app but the problem still the same. I event tried to change eclipse with another version, but the same problem exist. :( :( :( :( On 9 November 2015 at 10:47, أنس الليثي <dev.fano...@gmail.com> wrote: > I tried both, the same exception still thrown > > On 9 November 2015 at 10:45, Sean Owen <so...@cloudera.com> wrote: > >> You included a very old version of the Twitter jar - 1.0.0. Did you mean >> 1.5.1? >> >> On Mon, Nov 9, 2015 at 7:36 AM, fanooos <dev.fano...@gmail.com> wrote: >> > This is my first Spark Stream application. The setup is as following >> > >> > 3 nodes running a spark cluster. One master node and two slaves. >> > >> > The application is a simple java application streaming from twitter and >> > dependencies managed by maven. >> > >> > Here is the code of the application >> > >> > public class SimpleApp { >> > >> > public static void main(String[] args) { >> > >> > SparkConf conf = new SparkConf().setAppName("Simple >> > Application").setMaster("spark://rethink-node01:7077"); >> > >> > JavaStreamingContext sc = new JavaStreamingContext(conf, new >> > Duration(1000)); >> > >> > ConfigurationBuilder cb = new ConfigurationBuilder(); >> > >> > cb.setDebugEnabled(true).setOAuthConsumerKey("ConsumerKey") >> > .setOAuthConsumerSecret("ConsumerSecret") >> > .setOAuthAccessToken("AccessToken") >> > .setOAuthAccessTokenSecret("TokenSecret"); >> > >> > OAuthAuthorization auth = new OAuthAuthorization(cb.build()); >> > >> > JavaDStream tweets = TwitterUtils.createStream(sc, >> auth); >> > >> > JavaDStream statuses = tweets.map(new Function<Status, >> > String>() { >> > public String call(Status status) throws Exception { >> > return status.getText(); >> > } >> > }); >> > >> > statuses.print();; >> > >> > sc.start(); >> > >> > sc.awaitTermination(); >> > >> > } >> > >> > } >> > >> > >> > here is the pom file >> > >> > http://maven.apache.org/POM/4.0.0; >> > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; >> > xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 >> > http://maven.apache.org/xsd/maven-4.0.0.xsd;> >> > 4.0.0 >> > SparkFirstTry >> > SparkFirstTry >> > 0.0.1-SNAPSHOT >> > >> > >> > >> > org.apache.spark >> > spark-core_2.10 >> > 1.5.1 >> > provided >> > >> > >> > >> > org.apache.spark >> > spark-streaming_2.10 >> > 1.5.1 >> > provided >> > >> > >> > >> > org.twitter4j >> > twitter4j-stream >> > 3.0.3 >> > >> > >> > org.apache.spark >> > spark-streaming-twitter_2.10 >> > 1.0.0 >> > >> > >> > >> > >> > >> > src >> > >> > >> > maven-compiler-plugin >> > 3.3 >> > >> > 1.8 >> > 1.8 >> > >> > >> > >> > maven-assembly-plugin >> > >> > >> > >> > >> > com.test.sparkTest.SimpleApp >> > >> > >> > >> > >> jar-with-dependencies >> > >> > >> > >> > >> > >> >
Spark Slave always fails to connect to master
I am trying to install a standalone spark cluster. I prepared 3 virtual machines each with Ubuntu installed. The three machines consists a cluster with one master and two slaves. I followed the steps in the documentation of Apache spark. I started the master script from the master node and it worked fine. The problem happens with the slaves. I tried to start the slave once using the sbin/start-slave.sh from each machine and another time using sbin/start-slaves.sh from the master node. The workers on the slaves nodes fails to start and throws the following exception 15/11/06 02:12:36 WARN Worker: Failed to connect to master rethink-node01:7077 akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkMaster@rethink-node01:7077/), Path(/user/Master)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266) at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:533) at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:569) at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:559) at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87) at akka.remote.EndpointWriter.postStop(Endpoint.scala:557) at akka.actor.Actor$class.aroundPostStop(Actor.scala:477) at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:411) at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210) at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) at akka.actor.ActorCell.terminate(ActorCell.scala:369) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) What is the problem ? Best regards
Re: Connection PHP application to Spark Sql thrift server
Sorry for late reply. I have tried to connect to the hive server instead of the spark sql but the same exception is thrown in the hive server logs. The only difference is the hive log has a little more information than the spark sql logs. The hive server logs has this message TTransportException : invalid status -128. This is the only difference between the two logs. Thanks On 5 March 2015 at 16:15, Cheng, Hao hao.ch...@intel.com wrote: Can you query upon Hive? Let's confirm if it's a bug of SparkSQL in your PHP code first. -Original Message- From: fanooos [mailto:dev.fano...@gmail.com] Sent: Thursday, March 5, 2015 4:57 PM To: user@spark.apache.org Subject: Connection PHP application to Spark Sql thrift server We have two applications need to connect to Spark Sql thrift server. The first application is developed in java. Having spark sql thrift server running, we following the steps in this link https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC and the application connected smoothly without any problem. The second application is developed in PHP. We followed the steps provided in this link https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-PHP . When hitting the php script, the spark sql thrift server throws this exception. 15/03/05 11:53:19 ERROR TThreadPoolServer: Error occurred during processing of message. java.lang.RuntimeException: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) ... 4 more I searched a lot about this exception but I can not figure out what is the problem. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Connection-PHP-application-to-Spark-Sql-thrift-server-tp21925.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Anas Rabei Senior Software Developer Mubasher.info anas.ra...@mubasher.info
Re: Connecting a PHP/Java applications to Spark SQL Thrift Server
Thanks very much, I used it and works fine with me. On 4 March 2015 at 11:56, Arush Kharbanda ar...@sigmoidanalytics.com wrote: For java You can use hive-jdbc connectivity jars to connect to Spark-SQL. The driver is inside the hive-jdbc Jar. *http://hive.apache.org/javadocs/r0.11.0/api/org/apache/hadoop/hive/jdbc/HiveDriver.html http://hive.apache.org/javadocs/r0.11.0/api/org/apache/hadoop/hive/jdbc/HiveDriver.html* On Wed, Mar 4, 2015 at 1:26 PM, n...@reactor8.com wrote: SparkSQL supports JDBC/ODBC connectivity, so if that's the route you needed/wanted to connect through you could do so via java/php apps. Havent used either so cant speak to the developer experience, assume its pretty good as would be preferred method for lots of third party enterprise apps/tooling If you prefer using the thrift server/interface, if they don't exist already in open source land you can use thrift definitions to generate client libs in any supported thrift language and use that for connectivity. Seems one issue with thrift-server is when running in cluster mode. Seems like it still exists but UX of error has been cleaned up in 1.3: https://issues.apache.org/jira/browse/SPARK-5176 -Original Message- From: fanooos [mailto:dev.fano...@gmail.com] Sent: Tuesday, March 3, 2015 11:15 PM To: user@spark.apache.org Subject: Connecting a PHP/Java applications to Spark SQL Thrift Server We have installed hadoop cluster with hive and spark and the spark sql thrift server is up and running without any problem. Now we have set of applications need to use spark sql thrift server to query some data. Some of these applications are java applications and the others are PHP applications. As I am an old fashioned java developer, I used to connect java applications to BD servers like Mysql using a JDBC driver. Is there a corresponding driver for connecting with Spark Sql Thrift server ? Or what is the library I need to use to connect to it? For PHP, what are the ways we can use to connect PHP applications to Spark Sql Thrift Server? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Connecting-a-PHP-Java-ap plications-to-Spark-SQL-Thrift-Server-tp21902.html http://apache-spark-user-list.1001560.n3.nabble.com/Connecting-a-PHP-Java-applications-to-Spark-SQL-Thrift-Server-tp21902.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- [image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com *Arush Kharbanda* || Technical Teamlead ar...@sigmoidanalytics.com || www.sigmoidanalytics.com -- Anas Rabei Senior Software Developer Mubasher.info anas.ra...@mubasher.info
Re: InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag
Hadoop version : 2.6.0 Spark Version : 1.2.1 here is also the pom.xml project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi= http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation= http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; modelVersion4.0.0/modelVersion groupIdTestSpark/groupId artifactIdTestSpark/artifactId version0.0.1-SNAPSHOT/version dependencies dependency groupIdorg.apache.spark/groupId artifactIdspark-core_2.10/artifactId version1.2.1/version /dependency /dependencies build sourceDirectorysrc/sourceDirectory plugins plugin artifactIdmaven-compiler-plugin/artifactId version3.1/version configuration source1.8/source target1.8/target /configuration /plugin /plugins /build /project Best regards On 24 February 2015 at 08:43, Ted Yu yuzhih...@gmail.com wrote: bq. have installed hadoop on a local virtual machine Can you tell us the release of hadoop you installed ? What Spark release are you using ? Or be more specific, what hadoop release was the Spark built against ? Cheers On Mon, Feb 23, 2015 at 9:37 PM, fanooos dev.fano...@gmail.com wrote: Hi I have installed hadoop on a local virtual machine using the steps from this URL https://www.digitalocean.com/community/tutorials/how-to-install-hadoop-on-ubuntu-13-10 In the local machine I write a little Spark application in java to read a file from the hadoop instance installed in the virtual machine. The code is below public static void main(String[] args) { JavaSparkContext sc = new JavaSparkContext(new SparkConf().setAppName(Spark Count).setMaster(local)); JavaRDDString lines = sc.textFile(hdfs://10.62.57.141:50070/tmp/lines.txt); JavaRDDInteger lengths = lines.flatMap(new FlatMapFunctionString, Integer() { @Override public IterableInteger call(String t) throws Exception { return Arrays.asList(t.length()); } }); ListInteger collect = lengths.collect(); int totalLength = lengths.reduce(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer v1, Integer v2) throws Exception { return v1+v2; } }); System.out.println(totalLength); } The application throws this exception Exception in thread main java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: TOSHIBA-PC/192.168.56.1; destination host is: 10.62.57.141:50070; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1351) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1701) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1647) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:222) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220) at scala.Option.getOrElse(Option.scala:120) at