Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data
ping. Anyone has some suggestions/advice for me . It will be really helpful. VG On Sun, Jul 24, 2016 at 12:19 AM, VG wrote: > Sean, > > I did this just to test the model. When I do a split of my data as > training to 80% and test to be 20% > > I get a Root-mean-square error = NaN > > So I am wondering where I might be going wrong > > Regards, > VG > > On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen wrote: > >> No, that's certainly not to be expected. ALS works by computing a much >> lower-rank representation of the input. It would not reproduce the >> input exactly, and you don't want it to -- this would be seriously >> overfit. This is why in general you don't evaluate a model on the >> training set. >> >> On Sat, Jul 23, 2016 at 7:37 PM, VG wrote: >> > I am trying to run ml.ALS to compute some recommendations. >> > >> > Just to test I am using the same dataset for training using ALSModel >> and for >> > predicting the results based on the model . >> > >> > When I evaluate the result using RegressionEvaluator I get a >> > Root-mean-square error = 1.5544064263236066 >> > >> > I thin this should be 0. Any suggestions what might be going wrong. >> > >> > Regards, >> > Vipul >> > >
Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data
Any suggestions / ideas here ? On Sun, Jul 24, 2016 at 12:19 AM, VG wrote: > Sean, > > I did this just to test the model. When I do a split of my data as > training to 80% and test to be 20% > > I get a Root-mean-square error = NaN > > So I am wondering where I might be going wrong > > Regards, > VG > > On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen wrote: > >> No, that's certainly not to be expected. ALS works by computing a much >> lower-rank representation of the input. It would not reproduce the >> input exactly, and you don't want it to -- this would be seriously >> overfit. This is why in general you don't evaluate a model on the >> training set. >> >> On Sat, Jul 23, 2016 at 7:37 PM, VG wrote: >> > I am trying to run ml.ALS to compute some recommendations. >> > >> > Just to test I am using the same dataset for training using ALSModel >> and for >> > predicting the results based on the model . >> > >> > When I evaluate the result using RegressionEvaluator I get a >> > Root-mean-square error = 1.5544064263236066 >> > >> > I thin this should be 0. Any suggestions what might be going wrong. >> > >> > Regards, >> > Vipul >> > >
Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data
Sean, I did this just to test the model. When I do a split of my data as training to 80% and test to be 20% I get a Root-mean-square error = NaN So I am wondering where I might be going wrong Regards, VG On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen wrote: > No, that's certainly not to be expected. ALS works by computing a much > lower-rank representation of the input. It would not reproduce the > input exactly, and you don't want it to -- this would be seriously > overfit. This is why in general you don't evaluate a model on the > training set. > > On Sat, Jul 23, 2016 at 7:37 PM, VG wrote: > > I am trying to run ml.ALS to compute some recommendations. > > > > Just to test I am using the same dataset for training using ALSModel and > for > > predicting the results based on the model . > > > > When I evaluate the result using RegressionEvaluator I get a > > Root-mean-square error = 1.5544064263236066 > > > > I thin this should be 0. Any suggestions what might be going wrong. > > > > Regards, > > Vipul >
Re: Error in collecting RDD as a Map - IOException in collectAsMap
Hi Pedro, Based on your suggestion, I deployed this on a aws node and it worked fine. thanks for your advice. I am still trying to figure out the issues on the local environment Anyways thanks again -VG On Sat, Jul 23, 2016 at 9:26 PM, Pedro Rodriguez wrote: > Have you changed spark-env.sh or spark-defaults.conf from the default? It > looks like spark is trying to address local workers based on a network > address (eg 192.168……) instead of on localhost (localhost, 127.0.0.1, > 0.0.0.0,…). Additionally, that network address doesn’t resolve correctly. > You might also check /etc/hosts to make sure that you don’t have anything > weird going on. > > Last thing to try perhaps is that are you running Spark within a VM and/or > Docker? If networking isn’t setup correctly on those you may also run into > trouble. > > What would be helpful is to know everything about your setup that might > affect networking. > > — > Pedro Rodriguez > PhD Student in Large-Scale Machine Learning | CU Boulder > Systems Oriented Data Scientist > UC Berkeley AMPLab Alumni > > pedrorodriguez.io | 909-353-4423 > github.com/EntilZha | LinkedIn > <https://www.linkedin.com/in/pedrorodriguezscience> > > On July 23, 2016 at 9:10:31 AM, VG (vlin...@gmail.com) wrote: > > Hi pedro, > > Apologies for not adding this earlier. > > This is running on a local cluster set up as follows. > JavaSparkContext jsc = new JavaSparkContext("local[2]", "DR"); > > Any suggestions based on this ? > > The ports are not blocked by firewall. > > Regards, > > > > On Sat, Jul 23, 2016 at 8:35 PM, Pedro Rodriguez > wrote: > >> Make sure that you don’t have ports firewalled. You don’t really give >> much information to work from, but it looks like the master can’t access >> the worker nodes for some reason. If you give more information on the >> cluster, networking, etc, it would help. >> >> For example, on AWS you can create a security group which allows all >> traffic to/from itself to itself. If you are using something like ufw on >> ubuntu then you probably need to know the ip addresses of the worker nodes >> beforehand. >> >> — >> Pedro Rodriguez >> PhD Student in Large-Scale Machine Learning | CU Boulder >> Systems Oriented Data Scientist >> UC Berkeley AMPLab Alumni >> >> pedrorodriguez.io | 909-353-4423 >> github.com/EntilZha | LinkedIn >> <https://www.linkedin.com/in/pedrorodriguezscience> >> >> On July 23, 2016 at 7:38:01 AM, VG (vlin...@gmail.com) wrote: >> >> Please suggest if I am doing something wrong or an alternative way of >> doing this. >> >> I have an RDD with two values as follows >> JavaPairRDD rdd >> >> When I execute rdd..collectAsMap() >> it always fails with IO exceptions. >> >> >> 16/07/23 19:03:58 ERROR RetryingBlockFetcher: Exception while beginning >> fetch of 1 outstanding blocks >> java.io.IOException: Failed to connect to /192.168.1.3:58179 >> at >> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228) >> at >> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179) >> at >> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96) >> at >> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) >> at >> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) >> at >> org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105) >> at >> org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92) >> at >> org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546) >> at >> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76) >> at >> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) >> at >> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) >> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793) >> at >> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) >> at java.lang.Thread.run(Unknown
Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data
I am trying to run ml.ALS to compute some recommendations. Just to test I am using the same dataset for training using ALSModel and for predicting the results based on the model . When I evaluate the result using RegressionEvaluator I get a Root-mean-square error = 1.5544064263236066 I thin this should be 0. Any suggestions what might be going wrong. Regards, Vipul
Re: Error in collecting RDD as a Map - IOException in collectAsMap
Hi pedro, Apologies for not adding this earlier. This is running on a local cluster set up as follows. JavaSparkContext jsc = new JavaSparkContext("local[2]", "DR"); Any suggestions based on this ? The ports are not blocked by firewall. Regards, On Sat, Jul 23, 2016 at 8:35 PM, Pedro Rodriguez wrote: > Make sure that you don’t have ports firewalled. You don’t really give much > information to work from, but it looks like the master can’t access the > worker nodes for some reason. If you give more information on the cluster, > networking, etc, it would help. > > For example, on AWS you can create a security group which allows all > traffic to/from itself to itself. If you are using something like ufw on > ubuntu then you probably need to know the ip addresses of the worker nodes > beforehand. > > — > Pedro Rodriguez > PhD Student in Large-Scale Machine Learning | CU Boulder > Systems Oriented Data Scientist > UC Berkeley AMPLab Alumni > > pedrorodriguez.io | 909-353-4423 > github.com/EntilZha | LinkedIn > <https://www.linkedin.com/in/pedrorodriguezscience> > > On July 23, 2016 at 7:38:01 AM, VG (vlin...@gmail.com) wrote: > > Please suggest if I am doing something wrong or an alternative way of > doing this. > > I have an RDD with two values as follows > JavaPairRDD rdd > > When I execute rdd..collectAsMap() > it always fails with IO exceptions. > > > 16/07/23 19:03:58 ERROR RetryingBlockFetcher: Exception while beginning > fetch of 1 outstanding blocks > java.io.IOException: Failed to connect to /192.168.1.3:58179 > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179) > at > org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) > at > org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105) > at > org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92) > at > org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: java.net.ConnectException: Connection timed out: no further > information: /192.168.1.3:58179 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) > at > io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224) > at > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > ... 1 more > 16/07/23 19:03:58 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 > outstanding blocks after 5000 ms > > > >
Error in collecting RDD as a Map - IOException in collectAsMap
Please suggest if I am doing something wrong or an alternative way of doing this. I have an RDD with two values as follows JavaPairRDD rdd When I execute rdd..collectAsMap() it always fails with IO exceptions. 16/07/23 19:03:58 ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks java.io.IOException: Failed to connect to /192.168.1.3:58179 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179) at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105) at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92) at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.net.ConnectException: Connection timed out: no further information: /192.168.1.3:58179 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more 16/07/23 19:03:58 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms
How to search on a Dataset / RDD
Any suggestions here please I basically need an ability to look up *name -> index* and *index -> name* in the code -VG On Fri, Jul 22, 2016 at 6:40 PM, VG wrote: > Hi All, > > I am really confused how to proceed further. Please help. > > I have a dataset created as follows: > Dataset b = sqlContext.sql("SELECT bid, name FROM business"); > > Now I need to map each name with a unique index and I did the following > JavaPairRDD indexedBId = business.javaRDD() >.zipWithIndex(); > > In later part of the code I need to change a datastructure and update name > with index value generated above . > I am unable to figure out how to do a look up here.. > > Please suggest /. > > If there is a better way to do this please suggest that. > > Regards > VG > >
Re: Error in running JavaALSExample example from spark examples
Great. thanks a ton for helping out on this Sean. I somehow messed this up (and was running in loops for last 2 hours ) thanks again -VG On Fri, Jul 22, 2016 at 11:28 PM, Sean Owen wrote: > You mark these provided, which is correct. If the version of Scala > provided at runtime differs, you'll have a problem. > > In fact you can also see you mixed Scala versions in your dependencies > here. MLlib is on 2.10. > > On Fri, Jul 22, 2016 at 6:49 PM, VG wrote: > > Sean, > > > > I am only using the maven dependencies for spark in my pom file. > > I don't have anything else. I guess maven dependency should resolve to > the > > correct scala version .. isn;t it ? Any ideas. > > > > > > org.apache.spark > > spark-core_2.11 > > 2.0.0-preview > > provided > > > > > > > > org.apache.spark > > spark-sql_2.11 > > 2.0.0-preview > > provided > > > > > > > > org.apache.spark > > spark-streaming_2.11 > > 2.0.0-preview > > provided > > > > > > > > org.apache.spark > > spark-mllib_2.10 > > 2.0.0-preview > > provided > > > > > > > > > > On Fri, Jul 22, 2016 at 11:16 PM, Sean Owen wrote: > >> > >> -dev > >> Looks like you are mismatching the version of Spark you deploy on at > >> runtime then. Sounds like it was built for Scala 2.10 > >> > >> On Fri, Jul 22, 2016 at 6:43 PM, VG wrote: > >> > Using 2.0.0-preview using maven > >> > So all dependencies should be correct I guess > >> > > >> > > >> > org.apache.spark > >> > spark-core_2.11 > >> > 2.0.0-preview > >> > provided > >> > > >> > > >> > I see in maven dependencies that this brings in > >> > scala-reflect-2.11.4 > >> > scala-compiler-2.11.0 > >> > > >> > and so on > >> > > >> > > >> > > >> > On Fri, Jul 22, 2016 at 11:04 PM, Aaron Ilovici > > >> > wrote: > >> >> > >> >> What version of Spark/Scala are you running? > >> >> > >> >> > >> >> > >> >> -Aaron > >> > > >> > > > > > >
Re: Error in running JavaALSExample example from spark examples
Sean, I am only using the maven dependencies for spark in my pom file. I don't have anything else. I guess maven dependency should resolve to the correct scala version .. isn;t it ? Any ideas. org.apache.spark spark-core_2.11 2.0.0-preview provided org.apache.spark spark-sql_2.11 2.0.0-preview provided org.apache.spark spark-streaming_2.11 2.0.0-preview provided org.apache.spark spark-mllib_2.10 2.0.0-preview provided On Fri, Jul 22, 2016 at 11:16 PM, Sean Owen wrote: > -dev > Looks like you are mismatching the version of Spark you deploy on at > runtime then. Sounds like it was built for Scala 2.10 > > On Fri, Jul 22, 2016 at 6:43 PM, VG wrote: > > Using 2.0.0-preview using maven > > So all dependencies should be correct I guess > > > > > > org.apache.spark > > spark-core_2.11 > > 2.0.0-preview > > provided > > > > > > I see in maven dependencies that this brings in > > scala-reflect-2.11.4 > > scala-compiler-2.11.0 > > > > and so on > > > > > > > > On Fri, Jul 22, 2016 at 11:04 PM, Aaron Ilovici > > wrote: > >> > >> What version of Spark/Scala are you running? > >> > >> > >> > >> -Aaron > > > > >
Re: Error in running JavaALSExample example from spark examples
Using 2.0.0-preview using maven So all dependencies should be correct I guess org.apache.spark spark-core_2.11 2.0.0-preview provided I see in maven dependencies that this brings in scala-reflect-2.11.4 scala-compiler-2.11.0 and so on On Fri, Jul 22, 2016 at 11:04 PM, Aaron Ilovici wrote: > What version of Spark/Scala are you running? > > > > -Aaron >
Error in running JavaALSExample example from spark examples
I am getting the following error Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror; at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:452) Any suggestions to resolve this VG
Re: Dataset , RDD zipWithIndex -- How to use as a map .
Hi All, Any suggestions for this Regards, VG On Fri, Jul 22, 2016 at 6:40 PM, VG wrote: > Hi All, > > I am really confused how to proceed further. Please help. > > I have a dataset created as follows: > Dataset b = sqlContext.sql("SELECT bid, name FROM business"); > > Now I need to map each name with a unique index and I did the following > JavaPairRDD indexedBId = business.javaRDD() >.zipWithIndex(); > > In later part of the code I need to change a datastructure and update name > with index value generated above . > I am unable to figure out how to do a look up here.. > > Please suggest /. > > If there is a better way to do this please suggest that. > > Regards > VG > >
Re: ml ALS.fit(..) issue
Can someone please help here. I tried both scala 2.10 and 2.11 on the system On Fri, Jul 22, 2016 at 7:59 PM, VG wrote: > I am using version 2.0.0-preview > > > > On Fri, Jul 22, 2016 at 7:47 PM, VG wrote: > >> I am running into the following error when running ALS >> >> Exception in thread "main" java.lang.NoSuchMethodError: >> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror; >> at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:452) >> at yelp.TestUser.main(TestUser.java:101) >> >> here line 101 in the above error is the following in code. >> >> ALSModel model = als.fit(training); >> >> >> Does anyone has a suggestion what is going on here and where I might be >> going wrong ? >> Please suggest >> >> -VG >> > >
Re: ml ALS.fit(..) issue
I am using version 2.0.0-preview On Fri, Jul 22, 2016 at 7:47 PM, VG wrote: > I am running into the following error when running ALS > > Exception in thread "main" java.lang.NoSuchMethodError: > scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror; > at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:452) > at yelp.TestUser.main(TestUser.java:101) > > here line 101 in the above error is the following in code. > > ALSModel model = als.fit(training); > > > Does anyone has a suggestion what is going on here and where I might be > going wrong ? > Please suggest > > -VG >
ml ALS.fit(..) issue
I am running into the following error when running ALS Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror; at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:452) at yelp.TestUser.main(TestUser.java:101) here line 101 in the above error is the following in code. ALSModel model = als.fit(training); Does anyone has a suggestion what is going on here and where I might be going wrong ? Please suggest -VG
Dataset , RDD zipWithIndex -- How to use as a map .
Hi All, I am really confused how to proceed further. Please help. I have a dataset created as follows: Dataset b = sqlContext.sql("SELECT bid, name FROM business"); Now I need to map each name with a unique index and I did the following JavaPairRDD indexedBId = business.javaRDD() .zipWithIndex(); In later part of the code I need to change a datastructure and update name with index value generated above . I am unable to figure out how to do a look up here.. Please suggest /. If there is a better way to do this please suggest that. Regards VG
Re: MLlib, Java, and DataFrame
Interesting. thanks for this information. On Fri, Jul 22, 2016 at 11:26 AM, Bryan Cutler wrote: > ML has a DataFrame based API, while MLlib is RDDs and will be deprecated > as of Spark 2.0. > > On Thu, Jul 21, 2016 at 10:41 PM, VG wrote: > >> Why do we have these 2 packages ... ml and mlib? >> What is the difference in these >> >> >> >> On Fri, Jul 22, 2016 at 11:09 AM, Bryan Cutler wrote: >> >>> Hi JG, >>> >>> If you didn't know this, Spark MLlib has 2 APIs, one of which uses >>> DataFrames. Take a look at this example >>> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaLinearRegressionWithElasticNetExample.java >>> >>> This example uses a Dataset, which is type equivalent to a >>> DataFrame. >>> >>> >>> On Thu, Jul 21, 2016 at 8:41 PM, Jean Georges Perrin >>> wrote: >>> >>>> Hi, >>>> >>>> I am looking for some really super basic examples of MLlib (like a >>>> linear regression over a list of values) in Java. I have found a few, but I >>>> only saw them using JavaRDD... and not DataFrame. >>>> >>>> I was kind of hoping to take my current DataFrame and send them in >>>> MLlib. Am I too optimistic? Do you know/have any example like that? >>>> >>>> Thanks! >>>> >>>> jg >>>> >>>> >>>> Jean Georges Perrin >>>> j...@jgp.net / @jgperrin >>>> >>>> >>>> >>>> >>>> >>> >> >
Re: MLlib, Java, and DataFrame
Why do we have these 2 packages ... ml and mlib? What is the difference in these On Fri, Jul 22, 2016 at 11:09 AM, Bryan Cutler wrote: > Hi JG, > > If you didn't know this, Spark MLlib has 2 APIs, one of which uses > DataFrames. Take a look at this example > https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaLinearRegressionWithElasticNetExample.java > > This example uses a Dataset, which is type equivalent to a DataFrame. > > > On Thu, Jul 21, 2016 at 8:41 PM, Jean Georges Perrin wrote: > >> Hi, >> >> I am looking for some really super basic examples of MLlib (like a linear >> regression over a list of values) in Java. I have found a few, but I only >> saw them using JavaRDD... and not DataFrame. >> >> I was kind of hoping to take my current DataFrame and send them in MLlib. >> Am I too optimistic? Do you know/have any example like that? >> >> Thanks! >> >> jg >> >> >> Jean Georges Perrin >> j...@jgp.net / @jgperrin >> >> >> >> >> >
Re: spark-xml - xml parsing when rows only have attributes
Great.. thanks for pointing this out. On Fri, Jun 17, 2016 at 6:21 PM, Ted Yu wrote: > Please see https://github.com/databricks/spark-xml/issues/92 > > On Fri, Jun 17, 2016 at 5:19 AM, VG wrote: > >> I am using spark-xml for loading data and creating a data frame. >> >> If xml element has sub elements and values, then it works fine. Example >> if the xml element is like >> >> >> test >> >> >> however if the xml element is bare with just attributes, then it does not >> work - Any suggestions. >> Does not load the data >> >> >> >> Any suggestions to fix this >> >> >> >> >> >> >> On Fri, Jun 17, 2016 at 4:28 PM, Siva A wrote: >> >>> Use Spark XML version,0.3.3 >>> >>> com.databricks >>> spark-xml_2.10 >>> 0.3.3 >>> >>> >>> On Fri, Jun 17, 2016 at 4:25 PM, VG wrote: >>> >>>> Hi Siva >>>> >>>> This is what i have for jars. Did you manage to run with these or >>>> different versions ? >>>> >>>> >>>> >>>> org.apache.spark >>>> spark-core_2.10 >>>> 1.6.1 >>>> >>>> >>>> org.apache.spark >>>> spark-sql_2.10 >>>> 1.6.1 >>>> >>>> >>>> com.databricks >>>> spark-xml_2.10 >>>> 0.2.0 >>>> >>>> >>>> org.scala-lang >>>> scala-library >>>> 2.10.6 >>>> >>>> >>>> Thanks >>>> VG >>>> >>>> >>>> On Fri, Jun 17, 2016 at 4:16 PM, Siva A >>>> wrote: >>>> >>>>> Hi Marco, >>>>> >>>>> I did run in IDE(Intellij) as well. It works fine. >>>>> VG, make sure the right jar is in classpath. >>>>> >>>>> --Siva >>>>> >>>>> On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni >>>>> wrote: >>>>> >>>>>> and your eclipse path is correct? >>>>>> i suggest, as Siva did before, to build your jar and run it via >>>>>> spark-submit by specifying the --packages option >>>>>> it's as simple as run this command >>>>>> >>>>>> spark-submit --packages >>>>>> com.databricks:spark-xml_: --class >>>>> of >>>>>> your class containing main> >>>>>> >>>>>> Indeed, if you have only these lines to run, why dont you try them in >>>>>> spark-shell ? >>>>>> >>>>>> hth >>>>>> >>>>>> On Fri, Jun 17, 2016 at 11:32 AM, VG wrote: >>>>>> >>>>>>> nopes. eclipse. >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 17, 2016 at 3:58 PM, Siva A >>>>>>> wrote: >>>>>>> >>>>>>>> If you are running from IDE, Are you using Intellij? >>>>>>>> >>>>>>>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Can you try to package as a jar and run using spark-submit >>>>>>>>> >>>>>>>>> Siva >>>>>>>>> >>>>>>>>> On Fri, Jun 17, 2016 at 3:17 PM, VG wrote: >>>>>>>>> >>>>>>>>>> I am trying to run from IDE and everything else is working fine. >>>>>>>>>> I added spark-xml jar and now I ended up into this dependency >>>>>>>>>> >>>>>>>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager >>>>>>>>>> Exception in thread "main" *java.lang.NoClassDefFoundError: >>>>>>>>>> scala/collection/GenTraversableOnce$class* >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.Data
spark-xml - xml parsing when rows only have attributes
I am using spark-xml for loading data and creating a data frame. If xml element has sub elements and values, then it works fine. Example if the xml element is like test however if the xml element is bare with just attributes, then it does not work - Any suggestions. Does not load the data Any suggestions to fix this On Fri, Jun 17, 2016 at 4:28 PM, Siva A wrote: > Use Spark XML version,0.3.3 > > com.databricks > spark-xml_2.10 > 0.3.3 > > > On Fri, Jun 17, 2016 at 4:25 PM, VG wrote: > >> Hi Siva >> >> This is what i have for jars. Did you manage to run with these or >> different versions ? >> >> >> >> org.apache.spark >> spark-core_2.10 >> 1.6.1 >> >> >> org.apache.spark >> spark-sql_2.10 >> 1.6.1 >> >> >> com.databricks >> spark-xml_2.10 >> 0.2.0 >> >> >> org.scala-lang >> scala-library >> 2.10.6 >> >> >> Thanks >> VG >> >> >> On Fri, Jun 17, 2016 at 4:16 PM, Siva A wrote: >> >>> Hi Marco, >>> >>> I did run in IDE(Intellij) as well. It works fine. >>> VG, make sure the right jar is in classpath. >>> >>> --Siva >>> >>> On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni >>> wrote: >>> >>>> and your eclipse path is correct? >>>> i suggest, as Siva did before, to build your jar and run it via >>>> spark-submit by specifying the --packages option >>>> it's as simple as run this command >>>> >>>> spark-submit --packages >>>> com.databricks:spark-xml_: --class >>> your class containing main> >>>> >>>> Indeed, if you have only these lines to run, why dont you try them in >>>> spark-shell ? >>>> >>>> hth >>>> >>>> On Fri, Jun 17, 2016 at 11:32 AM, VG wrote: >>>> >>>>> nopes. eclipse. >>>>> >>>>> >>>>> On Fri, Jun 17, 2016 at 3:58 PM, Siva A >>>>> wrote: >>>>> >>>>>> If you are running from IDE, Are you using Intellij? >>>>>> >>>>>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A >>>>>> wrote: >>>>>> >>>>>>> Can you try to package as a jar and run using spark-submit >>>>>>> >>>>>>> Siva >>>>>>> >>>>>>> On Fri, Jun 17, 2016 at 3:17 PM, VG wrote: >>>>>>> >>>>>>>> I am trying to run from IDE and everything else is working fine. >>>>>>>> I added spark-xml jar and now I ended up into this dependency >>>>>>>> >>>>>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager >>>>>>>> Exception in thread "main" *java.lang.NoClassDefFoundError: >>>>>>>> scala/collection/GenTraversableOnce$class* >>>>>>>> at >>>>>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150) >>>>>>>> at >>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154) >>>>>>>> at >>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >>>>>>>> at >>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >>>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >>>>>>>> Caused by:* java.lang.ClassNotFoundException: >>>>>>>> scala.collection.GenTraversableOnce$class* >>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>>>>>> ... 5 more >>>>>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown >>>>>>>> hook >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni < >>>>>>>> mmistr...@gmail.com> wr
Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org
It proceeded with the jars I mentioned. However no data getting loaded into data frame... sob sob :( On Fri, Jun 17, 2016 at 4:25 PM, VG wrote: > Hi Siva > > This is what i have for jars. Did you manage to run with these or > different versions ? > > > > org.apache.spark > spark-core_2.10 > 1.6.1 > > > org.apache.spark > spark-sql_2.10 > 1.6.1 > > > com.databricks > spark-xml_2.10 > 0.2.0 > > > org.scala-lang > scala-library > 2.10.6 > > > Thanks > VG > > > On Fri, Jun 17, 2016 at 4:16 PM, Siva A wrote: > >> Hi Marco, >> >> I did run in IDE(Intellij) as well. It works fine. >> VG, make sure the right jar is in classpath. >> >> --Siva >> >> On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni >> wrote: >> >>> and your eclipse path is correct? >>> i suggest, as Siva did before, to build your jar and run it via >>> spark-submit by specifying the --packages option >>> it's as simple as run this command >>> >>> spark-submit --packages >>> com.databricks:spark-xml_: --class >> your class containing main> >>> >>> Indeed, if you have only these lines to run, why dont you try them in >>> spark-shell ? >>> >>> hth >>> >>> On Fri, Jun 17, 2016 at 11:32 AM, VG wrote: >>> >>>> nopes. eclipse. >>>> >>>> >>>> On Fri, Jun 17, 2016 at 3:58 PM, Siva A >>>> wrote: >>>> >>>>> If you are running from IDE, Are you using Intellij? >>>>> >>>>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A >>>>> wrote: >>>>> >>>>>> Can you try to package as a jar and run using spark-submit >>>>>> >>>>>> Siva >>>>>> >>>>>> On Fri, Jun 17, 2016 at 3:17 PM, VG wrote: >>>>>> >>>>>>> I am trying to run from IDE and everything else is working fine. >>>>>>> I added spark-xml jar and now I ended up into this dependency >>>>>>> >>>>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager >>>>>>> Exception in thread "main" *java.lang.NoClassDefFoundError: >>>>>>> scala/collection/GenTraversableOnce$class* >>>>>>> at >>>>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154) >>>>>>> at >>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >>>>>>> at >>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >>>>>>> Caused by:* java.lang.ClassNotFoundException: >>>>>>> scala.collection.GenTraversableOnce$class* >>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>>>>> ... 5 more >>>>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown >>>>>>> hook >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni >>>>>> > wrote: >>>>>>> >>>>>>>> So you are using spark-submit or spark-shell? >>>>>>>> >>>>>>>> you will need to launch either by passing --packages option (like >>>>>>>> in the example below for spark-csv). you will need to iknow >>>>>>>> >>>>>>>> --packages com.databricks:spark-xml_:>>>>>>> version> >>>>>>>> >>>>>>>> hth >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jun 17, 2016 at 10:20 AM, VG wrote: >>>>>>>> >>>>>>>>> Apologies for that. >>>>>>>>&g
Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org
Hi Siva This is what i have for jars. Did you manage to run with these or different versions ? org.apache.spark spark-core_2.10 1.6.1 org.apache.spark spark-sql_2.10 1.6.1 com.databricks spark-xml_2.10 0.2.0 org.scala-lang scala-library 2.10.6 Thanks VG On Fri, Jun 17, 2016 at 4:16 PM, Siva A wrote: > Hi Marco, > > I did run in IDE(Intellij) as well. It works fine. > VG, make sure the right jar is in classpath. > > --Siva > > On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni > wrote: > >> and your eclipse path is correct? >> i suggest, as Siva did before, to build your jar and run it via >> spark-submit by specifying the --packages option >> it's as simple as run this command >> >> spark-submit --packages >> com.databricks:spark-xml_: --class > your class containing main> >> >> Indeed, if you have only these lines to run, why dont you try them in >> spark-shell ? >> >> hth >> >> On Fri, Jun 17, 2016 at 11:32 AM, VG wrote: >> >>> nopes. eclipse. >>> >>> >>> On Fri, Jun 17, 2016 at 3:58 PM, Siva A >>> wrote: >>> >>>> If you are running from IDE, Are you using Intellij? >>>> >>>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A >>>> wrote: >>>> >>>>> Can you try to package as a jar and run using spark-submit >>>>> >>>>> Siva >>>>> >>>>> On Fri, Jun 17, 2016 at 3:17 PM, VG wrote: >>>>> >>>>>> I am trying to run from IDE and everything else is working fine. >>>>>> I added spark-xml jar and now I ended up into this dependency >>>>>> >>>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager >>>>>> Exception in thread "main" *java.lang.NoClassDefFoundError: >>>>>> scala/collection/GenTraversableOnce$class* >>>>>> at >>>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150) >>>>>> at >>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154) >>>>>> at >>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >>>>>> at >>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >>>>>> Caused by:* java.lang.ClassNotFoundException: >>>>>> scala.collection.GenTraversableOnce$class* >>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>>>> ... 5 more >>>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown >>>>>> hook >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni >>>>>> wrote: >>>>>> >>>>>>> So you are using spark-submit or spark-shell? >>>>>>> >>>>>>> you will need to launch either by passing --packages option (like in >>>>>>> the example below for spark-csv). you will need to iknow >>>>>>> >>>>>>> --packages com.databricks:spark-xml_: >>>>>>> >>>>>>> hth >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 17, 2016 at 10:20 AM, VG wrote: >>>>>>> >>>>>>>> Apologies for that. >>>>>>>> I am trying to use spark-xml to load data of a xml file. >>>>>>>> >>>>>>>> here is the exception >>>>>>>> >>>>>>>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager >>>>>>>> Exception in thread "main" java.lang.ClassNotFoundException: Failed >>>>>>>> to find data source: org.apache.spark.xml. Please find packages at >>>>>>>> http://spark-packages.org >>>>>>>> at >>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(Reso
Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org
nopes. eclipse. On Fri, Jun 17, 2016 at 3:58 PM, Siva A wrote: > If you are running from IDE, Are you using Intellij? > > On Fri, Jun 17, 2016 at 3:20 PM, Siva A wrote: > >> Can you try to package as a jar and run using spark-submit >> >> Siva >> >> On Fri, Jun 17, 2016 at 3:17 PM, VG wrote: >> >>> I am trying to run from IDE and everything else is working fine. >>> I added spark-xml jar and now I ended up into this dependency >>> >>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager >>> Exception in thread "main" *java.lang.NoClassDefFoundError: >>> scala/collection/GenTraversableOnce$class* >>> at >>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150) >>> at >>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154) >>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >>> Caused by:* java.lang.ClassNotFoundException: >>> scala.collection.GenTraversableOnce$class* >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>> ... 5 more >>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown hook >>> >>> >>> >>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni >>> wrote: >>> >>>> So you are using spark-submit or spark-shell? >>>> >>>> you will need to launch either by passing --packages option (like in >>>> the example below for spark-csv). you will need to iknow >>>> >>>> --packages com.databricks:spark-xml_: >>>> >>>> hth >>>> >>>> >>>> >>>> On Fri, Jun 17, 2016 at 10:20 AM, VG wrote: >>>> >>>>> Apologies for that. >>>>> I am trying to use spark-xml to load data of a xml file. >>>>> >>>>> here is the exception >>>>> >>>>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager >>>>> Exception in thread "main" java.lang.ClassNotFoundException: Failed to >>>>> find data source: org.apache.spark.xml. Please find packages at >>>>> http://spark-packages.org >>>>> at >>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102) >>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >>>>> Caused by: java.lang.ClassNotFoundException: >>>>> org.apache.spark.xml.DefaultSource >>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) >>>>> at scala.util.Try$.apply(Try.scala:192) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) >>>>> at scala.util.Try.orElse(Try.scala:84) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62) >>>>> ... 4 more >>>>> >>>>> Code >>>>> SQLContext sqlContext = new SQLContext(sc); >>>>> DataFrame df = sqlContext.read() >>>>> .format("org.apache.spark.xml") >>>>> .option("rowTag", "row") >>>>> .load("A.xml"); >>>>> >>>>> Any suggestions please .. >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni >>>>> wrote: >>>>> >>>>>> too little info >>>>>> it'll help if you can post the exception and show your sbt file (if >>>>>> you are using sbt), and provide minimal details on what you are doing >>>>>> kr >>>>>> >>>>>> On Fri, Jun 17, 2016 at 10:08 AM, VG wrote: >>>>>> >>>>>>> Failed to find data source: com.databricks.spark.xml >>>>>>> >>>>>>> Any suggestions to resolve this >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org
I am trying to run from IDE and everything else is working fine. I added spark-xml jar and now I ended up into this dependency 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager Exception in thread "main" *java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class* at org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) Caused by:* java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class* at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 5 more 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown hook On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni wrote: > So you are using spark-submit or spark-shell? > > you will need to launch either by passing --packages option (like in the > example below for spark-csv). you will need to iknow > > --packages com.databricks:spark-xml_: > > hth > > > > On Fri, Jun 17, 2016 at 10:20 AM, VG wrote: > >> Apologies for that. >> I am trying to use spark-xml to load data of a xml file. >> >> here is the exception >> >> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager >> Exception in thread "main" java.lang.ClassNotFoundException: Failed to >> find data source: org.apache.spark.xml. Please find packages at >> http://spark-packages.org >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77) >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102) >> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >> Caused by: java.lang.ClassNotFoundException: >> org.apache.spark.xml.DefaultSource >> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) >> at scala.util.Try$.apply(Try.scala:192) >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) >> at scala.util.Try.orElse(Try.scala:84) >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62) >> ... 4 more >> >> Code >> SQLContext sqlContext = new SQLContext(sc); >> DataFrame df = sqlContext.read() >> .format("org.apache.spark.xml") >> .option("rowTag", "row") >> .load("A.xml"); >> >> Any suggestions please .. >> >> >> >> >> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni >> wrote: >> >>> too little info >>> it'll help if you can post the exception and show your sbt file (if you >>> are using sbt), and provide minimal details on what you are doing >>> kr >>> >>> On Fri, Jun 17, 2016 at 10:08 AM, VG wrote: >>> >>>> Failed to find data source: com.databricks.spark.xml >>>> >>>> Any suggestions to resolve this >>>> >>>> >>>> >>> >> >
Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org
Hi Siva, I still get a similar exception (See the highlighted section - It is looking for DataSource) 16/06/17 15:11:37 INFO BlockManagerMaster: Registered BlockManager Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: xml. Please find packages at http://spark-packages.org at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) *Caused by: java.lang.ClassNotFoundException: xml.DefaultSource* at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62) ... 4 more 16/06/17 15:11:38 INFO SparkContext: Invoking stop() from shutdown hook On Fri, Jun 17, 2016 at 2:56 PM, Siva A wrote: > Just try to use "xml" as format like below, > > SQLContext sqlContext = new SQLContext(sc); > DataFrame df = sqlContext.read() > .format("xml") > .option("rowTag", "row") > .load("A.xml"); > > FYR: https://github.com/databricks/spark-xml > > --Siva > > On Fri, Jun 17, 2016 at 2:50 PM, VG wrote: > >> Apologies for that. >> I am trying to use spark-xml to load data of a xml file. >> >> here is the exception >> >> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager >> Exception in thread "main" java.lang.ClassNotFoundException: Failed to >> find data source: org.apache.spark.xml. Please find packages at >> http://spark-packages.org >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77) >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102) >> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >> Caused by: java.lang.ClassNotFoundException: >> org.apache.spark.xml.DefaultSource >> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) >> at scala.util.Try$.apply(Try.scala:192) >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) >> at scala.util.Try.orElse(Try.scala:84) >> at >> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62) >> ... 4 more >> >> Code >> SQLContext sqlContext = new SQLContext(sc); >> DataFrame df = sqlContext.read() >> .format("org.apache.spark.xml") >> .option("rowTag", "row") >> .load("A.xml"); >> >> Any suggestions please .. >> >> >> >> >> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni >> wrote: >> >>> too little info >>> it'll help if you can post the exception and show your sbt file (if you >>> are using sbt), and provide minimal details on what you are doing >>> kr >>> >>> On Fri, Jun 17, 2016 at 10:08 AM, VG wrote: >>> >>>> Failed to find data source: com.databricks.spark.xml >>>> >>>> Any suggestions to resolve this >>>> >>>> >>>> >>> >> >
Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org
Apologies for that. I am trying to use spark-xml to load data of a xml file. here is the exception 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.xml. Please find packages at http://spark-packages.org at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) Caused by: java.lang.ClassNotFoundException: org.apache.spark.xml.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62) ... 4 more Code SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.read() .format("org.apache.spark.xml") .option("rowTag", "row") .load("A.xml"); Any suggestions please .. On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni wrote: > too little info > it'll help if you can post the exception and show your sbt file (if you > are using sbt), and provide minimal details on what you are doing > kr > > On Fri, Jun 17, 2016 at 10:08 AM, VG wrote: > >> Failed to find data source: com.databricks.spark.xml >> >> Any suggestions to resolve this >> >> >> >
java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org
Failed to find data source: com.databricks.spark.xml Any suggestions to resolve this
Re: ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks
Any suggestions on this please On Wed, Jun 15, 2016 at 10:42 PM, VG wrote: > I have a very simple driver which loads a textFile and filters a >> sub-string from each line in the textfile. >> When the collect action is executed , I am getting an exception. (The >> file is only 90 MB - so I am confused what is going on..) I am running on a >> local standalone cluster >> >> 16/06/15 19:45:22 INFO BlockManagerInfo: Removed broadcast_2_piece0 on >> 192.168.56.1:56413 in memory (size: 2.5 KB, free: 2.4 GB) >> 16/06/15 19:45:22 INFO BlockManagerInfo: Removed broadcast_1_piece0 on >> 192.168.56.1:56413 in memory (size: 1900.0 B, free: 2.4 GB) >> 16/06/15 19:45:22 INFO BlockManagerInfo: Added rdd_2_1 on disk on >> 192.168.56.1:56413 (size: 2.7 MB) >> 16/06/15 19:45:22 INFO MemoryStore: Block taskresult_7 stored as bytes in >> memory (estimated size 2.7 MB, free 2.4 GB) >> 16/06/15 19:45:22 INFO BlockManagerInfo: Added taskresult_7 in memory on >> 192.168.56.1:56413 (size: 2.7 MB, free: 2.4 GB) >> 16/06/15 19:45:22 INFO Executor: Finished task 1.0 in stage 2.0 (TID 7). >> 2823777 bytes result sent via BlockManager) >> 16/06/15 19:45:22 INFO TaskSetManager: Starting task 2.0 in stage 2.0 >> (TID 8, localhost, partition 2, PROCESS_LOCAL, 5422 bytes) >> 16/06/15 19:45:22 INFO Executor: Running task 2.0 in stage 2.0 (TID 8) >> 16/06/15 19:45:22 INFO HadoopRDD: Input split: >> file:/C:/Users/i303551/Downloads/ariba-logs/ssws/access.2016.04.26/access.2016.04.26:67108864+25111592 >> 16/06/15 19:45:22 INFO BlockManagerInfo: Added rdd_2_2 on disk on >> 192.168.56.1:56413 (size: 2.0 MB) >> 16/06/15 19:45:22 INFO MemoryStore: Block taskresult_8 stored as bytes in >> memory (estimated size 2.0 MB, free 2.4 GB) >> 16/06/15 19:45:22 INFO BlockManagerInfo: Added taskresult_8 in memory on >> 192.168.56.1:56413 (size: 2.0 MB, free: 2.4 GB) >> 16/06/15 19:45:22 INFO Executor: Finished task 2.0 in stage 2.0 (TID 8). >> 2143771 bytes result sent via BlockManager) >> 16/06/15 19:45:43 ERROR RetryingBlockFetcher: Exception while beginning >> fetch of 1 outstanding blocks >> java.io.IOException: Failed to connect to /192.168.56.1:56413 >> at >> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228) >> at >> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179) >> at >> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96) >> at >> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) >> at >> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) >> at >> org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105) >> at >> org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92) >> at >> org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546) >> at >> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76) >> at >> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) >> at >> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) >> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793) >> at >> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) >> at java.lang.Thread.run(Unknown Source) >> Caused by: java.net.ConnectException: Connection timed out: no further >> information: /192.168.56.1:56413 >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) >> at >> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224) >> at >> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289) >> at >> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) >> at >> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) >> at >> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) >> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) >> at >> io.netty.util.concurrent.SingleThre
Fwd: ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks
> > I have a very simple driver which loads a textFile and filters a > sub-string from each line in the textfile. > When the collect action is executed , I am getting an exception. (The > file is only 90 MB - so I am confused what is going on..) I am running on a > local standalone cluster > > 16/06/15 19:45:22 INFO BlockManagerInfo: Removed broadcast_2_piece0 on > 192.168.56.1:56413 in memory (size: 2.5 KB, free: 2.4 GB) > 16/06/15 19:45:22 INFO BlockManagerInfo: Removed broadcast_1_piece0 on > 192.168.56.1:56413 in memory (size: 1900.0 B, free: 2.4 GB) > 16/06/15 19:45:22 INFO BlockManagerInfo: Added rdd_2_1 on disk on > 192.168.56.1:56413 (size: 2.7 MB) > 16/06/15 19:45:22 INFO MemoryStore: Block taskresult_7 stored as bytes in > memory (estimated size 2.7 MB, free 2.4 GB) > 16/06/15 19:45:22 INFO BlockManagerInfo: Added taskresult_7 in memory on > 192.168.56.1:56413 (size: 2.7 MB, free: 2.4 GB) > 16/06/15 19:45:22 INFO Executor: Finished task 1.0 in stage 2.0 (TID 7). > 2823777 bytes result sent via BlockManager) > 16/06/15 19:45:22 INFO TaskSetManager: Starting task 2.0 in stage 2.0 (TID > 8, localhost, partition 2, PROCESS_LOCAL, 5422 bytes) > 16/06/15 19:45:22 INFO Executor: Running task 2.0 in stage 2.0 (TID 8) > 16/06/15 19:45:22 INFO HadoopRDD: Input split: > file:/C:/Users/i303551/Downloads/ariba-logs/ssws/access.2016.04.26/access.2016.04.26:67108864+25111592 > 16/06/15 19:45:22 INFO BlockManagerInfo: Added rdd_2_2 on disk on > 192.168.56.1:56413 (size: 2.0 MB) > 16/06/15 19:45:22 INFO MemoryStore: Block taskresult_8 stored as bytes in > memory (estimated size 2.0 MB, free 2.4 GB) > 16/06/15 19:45:22 INFO BlockManagerInfo: Added taskresult_8 in memory on > 192.168.56.1:56413 (size: 2.0 MB, free: 2.4 GB) > 16/06/15 19:45:22 INFO Executor: Finished task 2.0 in stage 2.0 (TID 8). > 2143771 bytes result sent via BlockManager) > 16/06/15 19:45:43 ERROR RetryingBlockFetcher: Exception while beginning > fetch of 1 outstanding blocks > java.io.IOException: Failed to connect to /192.168.56.1:56413 > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179) > at > org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) > at > org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105) > at > org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92) > at > org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: java.net.ConnectException: Connection timed out: no further > information: /192.168.56.1:56413 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) > at > io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224) > at > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > ... 1 more > 16/06/15 19:45:43 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 > outstanding blocks after 5000 ms > 16/06/15 19:46:04 ERROR RetryingBlockFetcher: Exception while beginning > fetch of 1 outstanding blocks > java.io.IOException: Failed to connect to /192.168.56.1:56413 > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179) > at > org.apache.spark.network.netty.NettyBlockTransferService$$anon