read snappy compressed files in spark

2018-08-31 Thread Ricky
I wanna be able to read snappy compressed files in spark. I can do a
val df = spark.read.textFile("hdfs:// path")
and it passes that test in spark shell but beyond that when i do a
df.show(10,false) or something - it shows me binary data mixed with real
text - how do I read the decompressed file in spark - I can build a
dataframe reader if someone guides or nudges me in right direction ...


interpretation of Spark logs about remote fetch latency

2017-10-12 Thread ricky l
Hello Spark users,

I have an inquiry while analyzing a sample Spark task. The task has remote
fetches (shuffle) from few blocks. However, the remote fetch time does not
really make sense to me. Can someone please help to interpret this?

The logs came from Spark REST API. The task ID 33 needs four blocks, and it
has to fetch three blocks from remote machines. In the "shuffleReadMetrics"
section, however, it marks as the "fetchWaitTime" as 0 while it really
fetches about 2.4GB from remote machines.
While in task ID 34 below, it needs to fetch 4 blocks with total size of
around 3GB, it shows the fetchWaitTime is about 2.4 seconds, and only this
makes sense.

Is this an intended behavior?

"33" : {
  "taskId" : 33,
   
  "taskMetrics" : {

"shuffleReadMetrics" : {
  *"remoteBlocksFetched" : 3,*
  "localBlocksFetched" : 1,
  *"fetchWaitTime" : 0,*
  *"remoteBytesRead" : 2401539138,*
  "localBytesRead" : 800513041,
  "recordsRead" : 4
},
  }
},
"34" : {
  "taskId" : 34,
  
  "taskMetrics" : {

"shuffleReadMetrics" : {
  *"remoteBlocksFetched" : 4,*
  "localBlocksFetched" : 0,
  *"fetchWaitTime" : 2416,*
  *"remoteBytesRead" : 3202052194,*
  "localBytesRead" : 0,
  "recordsRead" : 4
},
  }
},


Exception during creation of ActorReceiver when running ActorWordCount on CDH 5.5.2

2016-08-29 Thread Ricky Pritchett
I get the following exception on the worker nodes when running the 
ActorWordCount Example. 
Caused by: java.lang.IllegalArgumentException: constructor public 
akka.actor.CreatorFunctionConsumer(scala.Function0) is incompatible with 
arguments [class java.lang.Class, class 
org.apache.spark.examples.streaming.ActorWordCount$$anonfun$2] at 
akka.util.Reflect$.instantiate(Reflect.scala:69) at 
akka.actor.Props.cachedActorClass(Props.scala:203) at 
akka.actor.Props.actorClass(Props.scala:327) at 
akka.dispatch.Mailboxes.getMailboxType(Mailboxes.scala:124) at 
akka.actor.LocalActorRefProvider.actorOf(ActorRefProvider.scala:718) ... 20 
moreCaused by: java.lang.IllegalArgumentException: wrong number of arguments at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at 
akka.util.Reflect$.instantiate(Reflect.scala:65) ... 24 more This exception 
seems to happen when the 
https://github.com/apache/spark/blob/branch-1.5/streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala#L147
 Attempts to create the worker actor. 
This causes the Props Akka class to invoke a Reflection call to instantiate the 
Actor. 
https://github.com/akka/akka/blob/v2.2.3/akka-actor/src/main/scala/akka/actor/Props.scala#L203
  
Which in turn causes the above exception. What can I do to resolve this issue? 
Thanks,  RP

??????Cleaning spark memory

2016-06-10 Thread Ricky
wml 

iPhone

--  --
??: Takeshi Yamamuro 
: 2016611 12:15
??: Cesar Flores 
: user 
: Re: Cleaning spark memory

Negative Number of Workers used memory in Spark UI

2016-01-10 Thread Ricky
In spark UI , Workers used memoy show negative number as following picture:

spark version:1.4.0
 How to solve this problem? appreciate for you help!

3526FD5F@8B5ABE15.9A0C9356.png
Description: Binary data


??????Classpath problem trying to use DataFrames

2015-12-12 Thread Ricky
encountor similar problems using hivecontext .When code print classload ,it was 
changed to  multiclassloader from APPclassloader




--  --
??: Harsh J 
: 2015??12??12?? 12:09
??: Christopher Brady , user 

: Re: Classpath problem trying to use DataFrames



Do you have all your hive jars listed in the classpath.txt / 
SPARK_DIST_CLASSPATH env., specifically the hive-exec jar? Is the location of 
that jar also the same on all the distributed hosts?

Passing an explicit executor classpath string may also help overcome this 
(replace HIVE_BASE_DIR to the root of your hive installation):


--conf "spark.executor.extraClassPath=$HIVE_BASE_DIR/hive/lib/*"

On Sat, Dec 12, 2015 at 6:32 AM Christopher Brady 
 wrote:

I'm trying to run a basic "Hello world" type example using DataFrames
 with Hive in yarn-client mode. My code is:
 
 JavaSparkContext sc = new JavaSparkContext("yarn-client", "Test app"))
 HiveContext sqlContext = new HiveContext(sc.sc());
 sqlContext.sql("SELECT * FROM my_table").count();
 
 The exception I get on the driver is:
 java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.plan.TableDesc
 
 There are no exceptions on the executors.
 
 That class is definitely on the classpath of the driver, and it runs
 without errors in local mode. I haven't been able to find any similar
 errors on google. Does anyone know what I'm doing wrong?
 
 The full stack trace is included below:
 
 java.lang.NoClassDefFoundError: Lorg/apache/hadoop/hive/ql/plan/TableDesc;
  at java.lang.Class.getDeclaredFields0(Native Method)
  at java.lang.Class.privateGetDeclaredFields(Class.java:2436)
  at java.lang.Class.getDeclaredField(Class.java:1946)
  at
 java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
  at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
  at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
  at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.io.ObjectStreamClass.(ObjectStreamClass.java:468)
  at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
  at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
  at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
  at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
  at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
  at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
  at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
  at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
  at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
  at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
  at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
  at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
  at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
  at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
  at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
  at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
  at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
  at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
  at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
  at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
  at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
  at
 

unsubscribe

2014-10-28 Thread Ricky Thomas



Re: Using Spark

2014-06-22 Thread Ricky Thomas
Awesome, thanks

On Sunday, June 22, 2014, Matei Zaharia matei.zaha...@gmail.com wrote:

 Alright, added you.

 On Jun 20, 2014, at 2:52 PM, Ricky Thomas ri...@truedash.io
 javascript:_e(%7B%7D,'cvml','ri...@truedash.io'); wrote:

 Hi,

 Would like to add ourselves to the user list if possible please?

 Company: truedash
 url: truedash.io

 Automatic pulling of all your data in to Spark for enterprise
 visualisation, predictive analytics and data exploration at a low cost.

 Currently in development with a few clients.

 Thanks





Fwd: Using Spark

2014-06-20 Thread Ricky Thomas
Hi,

Would like to add ourselves to the user list if possible please?

Company: truedash
url: truedash.io

Automatic pulling of all your data in to Spark for enterprise
visualisation, predictive analytics and data exploration at a low cost.

Currently in development with a few clients.

Thanks