Re: registering Array of CompactBuffer to Kryo

2014-10-02 Thread Andras Barjak
i used this solution to get the class name correctly at runtime:

kryo.register(ClassTag(Class.forName("org.apache.spark.util.collection.CompactBuffer")).wrap.runtimeClass)


2014-10-02 12:50 GMT+02:00 Daniel Darabos 
:

> How about this?
>
> Class.forName("[Lorg.apache.spark.util.collection.CompactBuffer;")
>
> On Tue, Sep 30, 2014 at 5:33 PM, Andras Barjak
>  wrote:
> > Hi,
> >
> > what is the correct scala code to register an Array of this private spark
> > class to Kryo?
> >
> > "java.lang.IllegalArgumentException: Class is not registered:
> > org.apache.spark.util.collection.CompactBuffer[]
> > Note: To register this class use:
> > kryo.register(org.apache.spark.util.collection.CompactBuffer[].class);"
> >
> > thanks,
> >
> > András Barják
>


registering Array of CompactBuffer to Kryo

2014-09-30 Thread Andras Barjak
Hi,

what is the correct scala code to register an Array of this private spark
class to Kryo?

"java.lang.IllegalArgumentException: Class is not registered:
org.apache.spark.util.collection.CompactBuffer[]
Note: To register this class use:
kryo.register(org.apache.spark.util.collection.CompactBuffer[].class);"

thanks,

András Barják


EC2 instances missing SSD drives randomly?

2014-08-19 Thread Andras Barjak
Hi,

Using the spark 1.0.1 ec2 script I launched 35 m3.2xlarge instances. (I was
using Singapore region.) Some of the instances we got without the ephemeral
internal (non-EBS) SSD devices that are supposed to be connected to them.
Some of them have these drives but not all, and there is no sign from the
outside, one can only notice this by ssh-ing into the instances and typing
`df -l` thus it seems to be a bug to me.
I am not sure if Amazon is not providing the drives or the Spark AMI
configures something wrong. Do you have any idea what is going on? I neved
faced this issue before. It is not like the drive is not formatted/mounted
(as it was the case with the new r3 instances), they are not present
physically. (Though the mnt and mnt2 are configured properly in fstab.)

I did several tries and the result was the same: some of the instances
launched with the drives, some without.

Please, let me know if you have some ideas what to do with this
inconsistent behaviour.

András


mounting SSD devices of EC2 r3.8xlarge instances

2014-06-03 Thread Andras Barjak
Hi,
I have noticed that upon launching a cluster consisting of r3.8xlarge
high-memory instances the standard /mnt /mnt2 /mnt3 /mnt4 temporary
directories get created and set up for temp usage, however they will point
to the root 8Gb filesystem.
The 2x320GB SSD-s are not mounted and also they are not even formatted.

This problem might affect other EC2 instances as well, I suppose.
I am using 0.9.1, is this something that has been corrected in the 1.0.0
spark-ec2 script?

regards,
András Barják


Re: Spark and Hadoop

2014-05-20 Thread Andras Barjak
You can download any of them, I would go with the latest versions,
or just download the source and build it yourself.
For experimenting with basic things you can just launch the REPL
and start right away in spark local mode not using any hadoop stuff.


2014-05-20 19:43 GMT+02:00 pcutil :

> I'm a first time user and need to try just the hello world kind of program
> in
> spark.
>
> Now on downloads page, I see following 3 options for Pre-built packages
> that
> I can download:
>
> - Hadoop 1 (HDP1, CDH3)
> - CDH4
> - Hadoop 2 (HDP2, CDH5)
>
> I'm confused which one do I need to download. I need to try just the hello
> world kind of program which has nothing to do with Hadoop. Or is it that I
> can only use Spark with Hadoop?
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-Hadoop-tp6113.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


Re: Spark runs applications in an inconsistent way

2014-04-23 Thread Andras Barjak
>
>- Spark UI shows number of succeeded tasks is more than total number
>of tasks, eg: 3500/3000. There are no failed tasks. At this stage the
>computation keeps carrying on for a long time without returning an answer.
>
> No sign of resubmitted tasks in the command line logs either?
You might want to get more information on what is going on in the JVM?
I don't know what others use but jvmtop is easy to install on ec2 and you
can monitor some processes.

>
>- The only way to get an answer from an application is to hopelessly
>keep running that application multiple times, until by some luck it gets
>converged.
>
> I was not able to regenerate this by a minimal code, as it seems some
> random factors affect this behavior. I have a suspicion, but I'm not sure,
> that use of one or more groupByKey() calls intensifies this problem.
>
Is this related to the amount of data you are processing? Is it more likely
to happen on large data?
My experience on ec2 is whenever the the memory/partitioning/timout
settings are reasonable
the output is quite consistent. Even if I stop and restart the cluster the
other day.