read snappy compressed files in spark
I wanna be able to read snappy compressed files in spark. I can do a val df = spark.read.textFile("hdfs:// path") and it passes that test in spark shell but beyond that when i do a df.show(10,false) or something - it shows me binary data mixed with real text - how do I read the decompressed file in spark - I can build a dataframe reader if someone guides or nudges me in right direction ...
interpretation of Spark logs about remote fetch latency
Hello Spark users, I have an inquiry while analyzing a sample Spark task. The task has remote fetches (shuffle) from few blocks. However, the remote fetch time does not really make sense to me. Can someone please help to interpret this? The logs came from Spark REST API. The task ID 33 needs four blocks, and it has to fetch three blocks from remote machines. In the "shuffleReadMetrics" section, however, it marks as the "fetchWaitTime" as 0 while it really fetches about 2.4GB from remote machines. While in task ID 34 below, it needs to fetch 4 blocks with total size of around 3GB, it shows the fetchWaitTime is about 2.4 seconds, and only this makes sense. Is this an intended behavior? "33" : { "taskId" : 33, "taskMetrics" : { "shuffleReadMetrics" : { *"remoteBlocksFetched" : 3,* "localBlocksFetched" : 1, *"fetchWaitTime" : 0,* *"remoteBytesRead" : 2401539138,* "localBytesRead" : 800513041, "recordsRead" : 4 }, } }, "34" : { "taskId" : 34, "taskMetrics" : { "shuffleReadMetrics" : { *"remoteBlocksFetched" : 4,* "localBlocksFetched" : 0, *"fetchWaitTime" : 2416,* *"remoteBytesRead" : 3202052194,* "localBytesRead" : 0, "recordsRead" : 4 }, } },
Exception during creation of ActorReceiver when running ActorWordCount on CDH 5.5.2
I get the following exception on the worker nodes when running the ActorWordCount Example. Caused by: java.lang.IllegalArgumentException: constructor public akka.actor.CreatorFunctionConsumer(scala.Function0) is incompatible with arguments [class java.lang.Class, class org.apache.spark.examples.streaming.ActorWordCount$$anonfun$2] at akka.util.Reflect$.instantiate(Reflect.scala:69) at akka.actor.Props.cachedActorClass(Props.scala:203) at akka.actor.Props.actorClass(Props.scala:327) at akka.dispatch.Mailboxes.getMailboxType(Mailboxes.scala:124) at akka.actor.LocalActorRefProvider.actorOf(ActorRefProvider.scala:718) ... 20 moreCaused by: java.lang.IllegalArgumentException: wrong number of arguments at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at akka.util.Reflect$.instantiate(Reflect.scala:65) ... 24 more This exception seems to happen when the https://github.com/apache/spark/blob/branch-1.5/streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala#L147 Attempts to create the worker actor. This causes the Props Akka class to invoke a Reflection call to instantiate the Actor. https://github.com/akka/akka/blob/v2.2.3/akka-actor/src/main/scala/akka/actor/Props.scala#L203 Which in turn causes the above exception. What can I do to resolve this issue? Thanks, RP
??????Cleaning spark memory
wml iPhone -- -- ??: Takeshi Yamamuro : 2016年6月11日 12:15 ??: Cesar Flores : user : Re: Cleaning spark memory
Negative Number of Workers used memory in Spark UI
In spark UI , Workers used memoy show negative number as following picture: spark version:1.4.0 How to solve this problem? appreciate for you help! 3526FD5F@8B5ABE15.9A0C9356.png Description: Binary data
??????Classpath problem trying to use DataFrames
encountor similar problems using hivecontext .When code print classload ,it was changed to multiclassloader from APPclassloader -- -- ??: Harsh J : 2015??12??12?? 12:09 ??: Christopher Brady , user : Re: Classpath problem trying to use DataFrames Do you have all your hive jars listed in the classpath.txt / SPARK_DIST_CLASSPATH env., specifically the hive-exec jar? Is the location of that jar also the same on all the distributed hosts? Passing an explicit executor classpath string may also help overcome this (replace HIVE_BASE_DIR to the root of your hive installation): --conf "spark.executor.extraClassPath=$HIVE_BASE_DIR/hive/lib/*" On Sat, Dec 12, 2015 at 6:32 AM Christopher Brady wrote: I'm trying to run a basic "Hello world" type example using DataFrames with Hive in yarn-client mode. My code is: JavaSparkContext sc = new JavaSparkContext("yarn-client", "Test app")) HiveContext sqlContext = new HiveContext(sc.sc()); sqlContext.sql("SELECT * FROM my_table").count(); The exception I get on the driver is: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.plan.TableDesc There are no exceptions on the executors. That class is definitely on the classpath of the driver, and it runs without errors in local mode. I haven't been able to find any similar errors on google. Does anyone know what I'm doing wrong? The full stack trace is included below: java.lang.NoClassDefFoundError: Lorg/apache/hadoop/hive/ql/plan/TableDesc; at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2436) at java.lang.Class.getDeclaredField(Class.java:1946) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.(ObjectStreamClass.java:468) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at scala.collection.immutable.$colon$colon.readObject(List.scala:362) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject
unsubscribe
Re: Using Spark
Awesome, thanks On Sunday, June 22, 2014, Matei Zaharia wrote: > Alright, added you. > > On Jun 20, 2014, at 2:52 PM, Ricky Thomas > wrote: > > Hi, > > Would like to add ourselves to the user list if possible please? > > Company: truedash > url: truedash.io > > Automatic pulling of all your data in to Spark for enterprise > visualisation, predictive analytics and data exploration at a low cost. > > Currently in development with a few clients. > > Thanks > > >
Fwd: Using Spark
Hi, Would like to add ourselves to the user list if possible please? Company: truedash url: truedash.io Automatic pulling of all your data in to Spark for enterprise visualisation, predictive analytics and data exploration at a low cost. Currently in development with a few clients. Thanks