Re: registering Array of CompactBuffer to Kryo
i used this solution to get the class name correctly at runtime: kryo.register(ClassTag(Class.forName("org.apache.spark.util.collection.CompactBuffer")).wrap.runtimeClass) 2014-10-02 12:50 GMT+02:00 Daniel Darabos : > How about this? > > Class.forName("[Lorg.apache.spark.util.collection.CompactBuffer;") > > On Tue, Sep 30, 2014 at 5:33 PM, Andras Barjak > wrote: > > Hi, > > > > what is the correct scala code to register an Array of this private spark > > class to Kryo? > > > > "java.lang.IllegalArgumentException: Class is not registered: > > org.apache.spark.util.collection.CompactBuffer[] > > Note: To register this class use: > > kryo.register(org.apache.spark.util.collection.CompactBuffer[].class);" > > > > thanks, > > > > András Barják >
registering Array of CompactBuffer to Kryo
Hi, what is the correct scala code to register an Array of this private spark class to Kryo? "java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.util.collection.CompactBuffer[] Note: To register this class use: kryo.register(org.apache.spark.util.collection.CompactBuffer[].class);" thanks, András Barják
EC2 instances missing SSD drives randomly?
Hi, Using the spark 1.0.1 ec2 script I launched 35 m3.2xlarge instances. (I was using Singapore region.) Some of the instances we got without the ephemeral internal (non-EBS) SSD devices that are supposed to be connected to them. Some of them have these drives but not all, and there is no sign from the outside, one can only notice this by ssh-ing into the instances and typing `df -l` thus it seems to be a bug to me. I am not sure if Amazon is not providing the drives or the Spark AMI configures something wrong. Do you have any idea what is going on? I neved faced this issue before. It is not like the drive is not formatted/mounted (as it was the case with the new r3 instances), they are not present physically. (Though the mnt and mnt2 are configured properly in fstab.) I did several tries and the result was the same: some of the instances launched with the drives, some without. Please, let me know if you have some ideas what to do with this inconsistent behaviour. András
mounting SSD devices of EC2 r3.8xlarge instances
Hi, I have noticed that upon launching a cluster consisting of r3.8xlarge high-memory instances the standard /mnt /mnt2 /mnt3 /mnt4 temporary directories get created and set up for temp usage, however they will point to the root 8Gb filesystem. The 2x320GB SSD-s are not mounted and also they are not even formatted. This problem might affect other EC2 instances as well, I suppose. I am using 0.9.1, is this something that has been corrected in the 1.0.0 spark-ec2 script? regards, András Barják
Re: Spark and Hadoop
You can download any of them, I would go with the latest versions, or just download the source and build it yourself. For experimenting with basic things you can just launch the REPL and start right away in spark local mode not using any hadoop stuff. 2014-05-20 19:43 GMT+02:00 pcutil : > I'm a first time user and need to try just the hello world kind of program > in > spark. > > Now on downloads page, I see following 3 options for Pre-built packages > that > I can download: > > - Hadoop 1 (HDP1, CDH3) > - CDH4 > - Hadoop 2 (HDP2, CDH5) > > I'm confused which one do I need to download. I need to try just the hello > world kind of program which has nothing to do with Hadoop. Or is it that I > can only use Spark with Hadoop? > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-Hadoop-tp6113.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
Re: Spark runs applications in an inconsistent way
> >- Spark UI shows number of succeeded tasks is more than total number >of tasks, eg: 3500/3000. There are no failed tasks. At this stage the >computation keeps carrying on for a long time without returning an answer. > > No sign of resubmitted tasks in the command line logs either? You might want to get more information on what is going on in the JVM? I don't know what others use but jvmtop is easy to install on ec2 and you can monitor some processes. > >- The only way to get an answer from an application is to hopelessly >keep running that application multiple times, until by some luck it gets >converged. > > I was not able to regenerate this by a minimal code, as it seems some > random factors affect this behavior. I have a suspicion, but I'm not sure, > that use of one or more groupByKey() calls intensifies this problem. > Is this related to the amount of data you are processing? Is it more likely to happen on large data? My experience on ec2 is whenever the the memory/partitioning/timout settings are reasonable the output is quite consistent. Even if I stop and restart the cluster the other day.