Fwd: Key Class - NotSerializableException

2013-12-08 Thread Archit Thakur
Hi Nick,

Yeah I saw that. I actually used sc.sequenceFile file to load data from
HDFS to RDD. Also both my key class and value class implements
WritableComparable of Hadoop. Still I got the error
"java.io.NotSerializableException", When I used sortByKey.

Hierarchy of my classes:

Collection
KeyCollection extends Collection implements WritableComparable
ValueCollection extends Collection implements WritableComparable
DS extends KeyCollection
MS extends ValueCollection

and I use DS and MS classes for key and value. With this hierarchy I get
java.io.NotSerializableException with sortByKey. So I made Collection as
Serializable and now It was unable to find some method required for the
static field of class Collection.

Thanks and Regards,
Archit Thakur.



On Mon, Dec 9, 2013 at 11:38 AM, MLnick  wrote:

> Hi Archit
>
> Spark provides a convenience class for sequencefile that provides implicit
> conversion from Writable to appropriate Scala classes:
>
> Import SparkContext._
> sc.sequenceFile[String, String](path)
>
> You should end up with an RDD[(String, String)] and won't have any
> serializable issues.
>
> Hope this helps
> N
>
> --
> You received this message because you are subscribed to the Google Groups
> "Spark Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to spark-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>


Re: Key Class - NotSerializableException

2013-12-08 Thread Archit Thakur
1 update:

Now both of the commands with and without sortByKey have started throwing
an error.

Loss was due to java.lang.NoSuchMethodError
java.lang.NoSuchMethodError:
com.guavus.logging.Logger.(Ljava/lang/Class;)V
at
com.guavus.mapred.common.collection.Collection.(Collection.java:17)




On Mon, Dec 9, 2013 at 11:21 AM, Archit Thakur wrote:

> And Since sortByKey serializes the classes, I guess it has something to do
> with Serialization thing.
>
>
> On Mon, Dec 9, 2013 at 11:19 AM, Archit Thakur 
> wrote:
>
>> I did make the classes Serialized. But now running the same command
>> sc.sequenceFile(file, classOf[Text], classOf[Text]).flatMap(map_
>> func).sortByKey().count(), gives me java.lang.NoSuchMethodError.
>>
>> For the Collection class which I made Serialized accesses one static
>> variable that
>>
>> static com.xyz.logging.Logger Logger = new
>> com.xyz.logging.Logger(Collection.class) and It throws
>>
>> java.lang.NoSuchMethodError:
>> com.guavus.logging.Logger.(Ljava/lang/Class;)V
>> at
>> com.guavus.mapred.common.collection.Collection.(Collection.java:17)
>>
>> but it doesn't do that when I don't sortByKey, ie when I run
>> sc.sequenceFile(file, classOf[Text], classOf[Text]).flatMap(map_
>> func).count() it doesn't throw the error.
>>
>> Thanks and Regards,
>> Archit Thakur.
>>
>>
>>
>> On Mon, Dec 9, 2013 at 10:48 AM, Patrick Wendell wrote:
>>
>>> It's because sorting serializes the data during the shuffle phase.
>>>
>>> On Sun, Dec 8, 2013 at 8:58 PM, Archit Thakur 
>>> wrote:
>>> > Hi,
>>> >
>>> > When I did
>>> >
>>> > sc.sequenceFile(file, classOf[Text],
>>> > classOf[Text]).flatMap(map_func).count()
>>> > It gave me result of 365.
>>> >
>>> > However, when I did
>>> > sc.sequenceFile(file, classOf[Text],
>>> > classOf[Text]).flatMap(map_func).sortByKey().count(),
>>> >
>>> > It threw java.io.NotSerializableException for Key Class returned by
>>> flapMap.
>>> > My question is
>>> > Why does sortByKey require the Key/Value Classes to be Serialized.?
>>> >
>>> > Thanks and Regards,
>>> > Archit Thakur.
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> Groups
>>> > "Spark Users" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> an
>>> > email to spark-users+unsubscr...@googlegroups.com.
>>> > For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Spark Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to spark-users+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
>


Re: Key Class - NotSerializableException

2013-12-08 Thread Archit Thakur
And Since sortByKey serializes the classes, I guess it has something to do
with Serialization thing.


On Mon, Dec 9, 2013 at 11:19 AM, Archit Thakur wrote:

> I did make the classes Serialized. But now running the same command
> sc.sequenceFile(file, classOf[Text], classOf[Text]).flatMap(map_
> func).sortByKey().count(), gives me java.lang.NoSuchMethodError.
>
> For the Collection class which I made Serialized accesses one static
> variable that
>
> static com.xyz.logging.Logger Logger = new
> com.xyz.logging.Logger(Collection.class) and It throws
>
> java.lang.NoSuchMethodError:
> com.guavus.logging.Logger.(Ljava/lang/Class;)V
> at
> com.guavus.mapred.common.collection.Collection.(Collection.java:17)
>
> but it doesn't do that when I don't sortByKey, ie when I run
> sc.sequenceFile(file, classOf[Text], classOf[Text]).flatMap(map_
> func).count() it doesn't throw the error.
>
> Thanks and Regards,
> Archit Thakur.
>
>
>
> On Mon, Dec 9, 2013 at 10:48 AM, Patrick Wendell wrote:
>
>> It's because sorting serializes the data during the shuffle phase.
>>
>> On Sun, Dec 8, 2013 at 8:58 PM, Archit Thakur 
>> wrote:
>> > Hi,
>> >
>> > When I did
>> >
>> > sc.sequenceFile(file, classOf[Text],
>> > classOf[Text]).flatMap(map_func).count()
>> > It gave me result of 365.
>> >
>> > However, when I did
>> > sc.sequenceFile(file, classOf[Text],
>> > classOf[Text]).flatMap(map_func).sortByKey().count(),
>> >
>> > It threw java.io.NotSerializableException for Key Class returned by
>> flapMap.
>> > My question is
>> > Why does sortByKey require the Key/Value Classes to be Serialized.?
>> >
>> > Thanks and Regards,
>> > Archit Thakur.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "Spark Users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to spark-users+unsubscr...@googlegroups.com.
>> > For more options, visit https://groups.google.com/groups/opt_out.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Spark Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to spark-users+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>


Re: Key Class - NotSerializableException

2013-12-08 Thread Archit Thakur
I did make the classes Serialized. But now running the same command
sc.sequenceFile(file, classOf[Text], classOf[Text]).flatMap(map_
func).sortByKey().count(), gives me java.lang.NoSuchMethodError.

For the Collection class which I made Serialized accesses one static
variable that

static com.xyz.logging.Logger Logger = new
com.xyz.logging.Logger(Collection.class) and It throws

java.lang.NoSuchMethodError:
com.guavus.logging.Logger.(Ljava/lang/Class;)V
at
com.guavus.mapred.common.collection.Collection.(Collection.java:17)

but it doesn't do that when I don't sortByKey, ie when I run
sc.sequenceFile(file, classOf[Text], classOf[Text]).flatMap(map_
func).count() it doesn't throw the error.

Thanks and Regards,
Archit Thakur.



On Mon, Dec 9, 2013 at 10:48 AM, Patrick Wendell  wrote:

> It's because sorting serializes the data during the shuffle phase.
>
> On Sun, Dec 8, 2013 at 8:58 PM, Archit Thakur 
> wrote:
> > Hi,
> >
> > When I did
> >
> > sc.sequenceFile(file, classOf[Text],
> > classOf[Text]).flatMap(map_func).count()
> > It gave me result of 365.
> >
> > However, when I did
> > sc.sequenceFile(file, classOf[Text],
> > classOf[Text]).flatMap(map_func).sortByKey().count(),
> >
> > It threw java.io.NotSerializableException for Key Class returned by
> flapMap.
> > My question is
> > Why does sortByKey require the Key/Value Classes to be Serialized.?
> >
> > Thanks and Regards,
> > Archit Thakur.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Spark Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to spark-users+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Spark Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to spark-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>


Re: Spark Import Issue

2013-12-08 Thread Andrew Ash
Also note that when you add parameters to the -cp flag on the JVM and want
to include multiple jars, the only way to do that is by including an entire
directory with "dir/*" -- you can't use "dir/*jar" or "dir/spark*jar" or
anything else like that.

http://stackoverflow.com/questions/219585/setting-multiple-jars-in-java-classpath


On Sun, Dec 8, 2013 at 12:25 AM, Matei Zaharia wrote:

> I’m not sure you can have a star inside that quoted classpath argument
> (the double quotes may cancel the *). Try using the JAR through its full
> name, or link to Spark through Maven (
> http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-java
> ).
>
> Matei
>
> On Dec 6, 2013, at 9:50 AM, Garrett Hamers  wrote:
>
> Hello,
>
> I am new to the spark system, and I am trying to write a simple program to
> get myself familiar with how spark works. I am currently having problem
> with importing the spark package. I am getting the following compiler
> error: package org.apache.spark.api.java does not exist.
>
> I have spark-0.8.0-incubating install. I ran the commands: sbt/sbt
> compile, sbt/sbt assembly, and sbt/sbt publish-local without any errors. My
> sql.java file is located in the spark-0.8.0-incubating root directory. I
> tried to compile the code using “javac sql.java” and “javac -cp
> "assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating*.jar"
> sql.java”.
>
> Here is the code for sql.java:
>
> package shark;
>
> import java.io.Serializable;
>
> import java.util.List;
>
> import java.io.*;
>
> import org.apache.spark.api.java.*; //Issue is here
>
> public class sql implements Serializable {
>
>   public static void main( String[] args) {
>
> System.out.println("Hello World”);
>
>   }
>
> }
>
>
>  What do I need to do in order for java to import the spark code properly?
> Any advice would be greatly appreciated.
>
> Thank you,
> Garrett Hamers
>
>
>


Re: Key Class - NotSerializableException

2013-12-08 Thread Patrick Wendell
It's because sorting serializes the data during the shuffle phase.

On Sun, Dec 8, 2013 at 8:58 PM, Archit Thakur  wrote:
> Hi,
>
> When I did
>
> sc.sequenceFile(file, classOf[Text],
> classOf[Text]).flatMap(map_func).count()
> It gave me result of 365.
>
> However, when I did
> sc.sequenceFile(file, classOf[Text],
> classOf[Text]).flatMap(map_func).sortByKey().count(),
>
> It threw java.io.NotSerializableException for Key Class returned by flapMap.
> My question is
> Why does sortByKey require the Key/Value Classes to be Serialized.?
>
> Thanks and Regards,
> Archit Thakur.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Spark Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to spark-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.


Key Class - NotSerializableException

2013-12-08 Thread Archit Thakur
Hi,

When I did

sc.sequenceFile(file, classOf[Text],
classOf[Text]).flatMap(map_func).count()
It gave me result of 365.

However, when I did
sc.sequenceFile(file, classOf[Text],
classOf[Text]).flatMap(map_func).sortByKey().count(),

It threw java.io.NotSerializableException for Key Class returned by
flapMap. My question is
Why does sortByKey require the Key/Value Classes to be Serialized.?

Thanks and Regards,
Archit Thakur.


Re: Bump: on disk storage formats

2013-12-08 Thread Azuryy Yu
Thanks for sharing.
 On 2013-12-09 11:50 AM, "Patrick Wendell"  wrote:

> Parquet might be a good fit for you then... it's pretty new and I
> don't have a lot of direct experience working with it. But I've seen
> examples of people using Spark with Parquet. You might want to
> checkout Matt Massie's post here:
>
> http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/
>
> This gives an example of using the Parquet format with Spark.
>
> - Patrick
>
> On Sun, Dec 8, 2013 at 7:09 PM, Ankur Chauhan 
> wrote:
> > Hi Patrick,
> >
> > I agree this is a very open ended question but I was trying to get a
> general answer anyway but I think you did hint on some nuances.
> > 1. My work load is definitely bottlenecked by disk IO just beacause even
> with a project on a single column(mostly 2-3 out of 20) there is a lot of
> data to churn throught.
> > 2. The fields are mostly all headers and some know parameter fields from
> a http GET request so analysis on let's say account id and user agent or ip
> address is fairly selective.
> > 3. Flattening the fields and using csv definitely looks like something i
> can try out.
> >
> > I believe parquet files can be ceated with a sorted column (for example
> timestamp) that would make selection of the right segment of data easier
> too(although i don't have any experience with parquet files).
> > What is the recommended way of interacting(read/write) with parquet
> files?
> >
> > -- Ankur
> >
> > On 8 Dec 2013, at 17:38, Patrick Wendell  wrote:
> >
> >> This is a very open ended question so it's hard to give a specific
> >> answer... it depends a lot on whether disk IO is a bottleneck in your
> >> workload and whether you tend to analyze all of each record or only
> >> certain fields. If you are doing disk IO a lot and only touching a few
> >> fields something like Parquet might help, or (simpler) just creating
> >> smaller projections of your data with only the fields you care about.
> >> Tab delimited formats can have less serialization overhead than JSON,
> >> so flattening the data might also help. It really depends on your
> >> access patterns and data types.
> >>
> >> In many cases with Spark another important question is how the user
> >> stores the data in-memory, not the on-disk format. It does depend how
> >> they are using Spark though.
> >>
> >> - Patrick
> >>
> >> On Sun, Dec 8, 2013 at 3:03 PM, Andrew Ash 
> wrote:
> >>> LZO compression at a minimum, and using Parquet as a second step,
> >>> seems like the way to go though I haven't tried either personally yet.
> >>>
> >>> Sent from my mobile phone
> >>>
> >>> On Dec 8, 2013, at 16:54, Ankur Chauhan 
> wrote:
> >>>
>  Hi all,
> 
>  Sorry for posting this again but I am interested in finding out what
> different on disk data formats for storing timeline event and analytics
> aggregate data.
> 
>  Currently I am just using newline delimited json gzipped files. I was
> wondering if there were any recommendations.
> 
>  -- Ankur
> >
>


Re: Bump: on disk storage formats

2013-12-08 Thread Patrick Wendell
Parquet might be a good fit for you then... it's pretty new and I
don't have a lot of direct experience working with it. But I've seen
examples of people using Spark with Parquet. You might want to
checkout Matt Massie's post here:

http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/

This gives an example of using the Parquet format with Spark.

- Patrick

On Sun, Dec 8, 2013 at 7:09 PM, Ankur Chauhan  wrote:
> Hi Patrick,
>
> I agree this is a very open ended question but I was trying to get a general 
> answer anyway but I think you did hint on some nuances.
> 1. My work load is definitely bottlenecked by disk IO just beacause even with 
> a project on a single column(mostly 2-3 out of 20) there is a lot of data to 
> churn throught.
> 2. The fields are mostly all headers and some know parameter fields from a 
> http GET request so analysis on let's say account id and user agent or ip 
> address is fairly selective.
> 3. Flattening the fields and using csv definitely looks like something i can 
> try out.
>
> I believe parquet files can be ceated with a sorted column (for example 
> timestamp) that would make selection of the right segment of data easier 
> too(although i don't have any experience with parquet files).
> What is the recommended way of interacting(read/write) with parquet files?
>
> -- Ankur
>
> On 8 Dec 2013, at 17:38, Patrick Wendell  wrote:
>
>> This is a very open ended question so it's hard to give a specific
>> answer... it depends a lot on whether disk IO is a bottleneck in your
>> workload and whether you tend to analyze all of each record or only
>> certain fields. If you are doing disk IO a lot and only touching a few
>> fields something like Parquet might help, or (simpler) just creating
>> smaller projections of your data with only the fields you care about.
>> Tab delimited formats can have less serialization overhead than JSON,
>> so flattening the data might also help. It really depends on your
>> access patterns and data types.
>>
>> In many cases with Spark another important question is how the user
>> stores the data in-memory, not the on-disk format. It does depend how
>> they are using Spark though.
>>
>> - Patrick
>>
>> On Sun, Dec 8, 2013 at 3:03 PM, Andrew Ash  wrote:
>>> LZO compression at a minimum, and using Parquet as a second step,
>>> seems like the way to go though I haven't tried either personally yet.
>>>
>>> Sent from my mobile phone
>>>
>>> On Dec 8, 2013, at 16:54, Ankur Chauhan  wrote:
>>>
 Hi all,

 Sorry for posting this again but I am interested in finding out what 
 different on disk data formats for storing timeline event and analytics 
 aggregate data.

 Currently I am just using newline delimited json gzipped files. I was 
 wondering if there were any recommendations.

 -- Ankur
>


Re: Build Spark with maven

2013-12-08 Thread Rajika Kumarasiri
Try to see if that dependency comes via a transitive dependency using a mvn
dependency tree.

Rajika


On Sat, Dec 7, 2013 at 1:31 AM, Azuryy Yu  wrote:

> Hey dears,
>
> Can you give me a maven repo, so I can compile Spark with Maven.
>
> I'm using http://repo1.maven.org/maven2/ currently
>
> but It complains cannot find akka-actor-2.0.1,  I searched on the
> repo1.maven, and I am also cannot find akka-actor-2.0.1, which is too old.
>
> another strange output I can see:
> 2.9.3  in the pom,
> but Maven download scala-2.9.2 during compile, why is that?
>
>
> Thanks.
>


Re: Bump: on disk storage formats

2013-12-08 Thread Ankur Chauhan
Hi Patrick,

I agree this is a very open ended question but I was trying to get a general 
answer anyway but I think you did hint on some nuances.
1. My work load is definitely bottlenecked by disk IO just beacause even with a 
project on a single column(mostly 2-3 out of 20) there is a lot of data to 
churn throught.
2. The fields are mostly all headers and some know parameter fields from a http 
GET request so analysis on let's say account id and user agent or ip address is 
fairly selective.
3. Flattening the fields and using csv definitely looks like something i can 
try out. 

I believe parquet files can be ceated with a sorted column (for example 
timestamp) that would make selection of the right segment of data easier 
too(although i don't have any experience with parquet files). 
What is the recommended way of interacting(read/write) with parquet files? 

-- Ankur

On 8 Dec 2013, at 17:38, Patrick Wendell  wrote:

> This is a very open ended question so it's hard to give a specific
> answer... it depends a lot on whether disk IO is a bottleneck in your
> workload and whether you tend to analyze all of each record or only
> certain fields. If you are doing disk IO a lot and only touching a few
> fields something like Parquet might help, or (simpler) just creating
> smaller projections of your data with only the fields you care about.
> Tab delimited formats can have less serialization overhead than JSON,
> so flattening the data might also help. It really depends on your
> access patterns and data types.
> 
> In many cases with Spark another important question is how the user
> stores the data in-memory, not the on-disk format. It does depend how
> they are using Spark though.
> 
> - Patrick
> 
> On Sun, Dec 8, 2013 at 3:03 PM, Andrew Ash  wrote:
>> LZO compression at a minimum, and using Parquet as a second step,
>> seems like the way to go though I haven't tried either personally yet.
>> 
>> Sent from my mobile phone
>> 
>> On Dec 8, 2013, at 16:54, Ankur Chauhan  wrote:
>> 
>>> Hi all,
>>> 
>>> Sorry for posting this again but I am interested in finding out what 
>>> different on disk data formats for storing timeline event and analytics 
>>> aggregate data.
>>> 
>>> Currently I am just using newline delimited json gzipped files. I was 
>>> wondering if there were any recommendations.
>>> 
>>> -- Ankur



Re: Build Spark with maven

2013-12-08 Thread Azuryy Yu
@Mark,

It works now after I changed the seetings.xml, but It would be better if
improve a little Spark document in the section of "Building Spark with
Maven
"


On Mon, Dec 9, 2013 at 10:45 AM, Azuryy Yu  wrote:

> @Mark
>
> I configured under apache-maven/conf/settings.xml:
>
>   
> 
> 
>   maven-nexus
>   external:*
>   Official Maven Repo
>   http://repo1.maven.org/maven2/
> 
>   
>
>
>
> On Mon, Dec 9, 2013 at 10:41 AM, Azuryy Yu  wrote:
>
>> @Mater, what's your maven mirror used in your setting.xml, can you share
>> with me? Thanks.
>>
>>
>>
>> On Mon, Dec 9, 2013 at 10:14 AM, Azuryy Yu  wrote:
>>
>>> Hi Mark,
>>>
>>> I build the current releast candidate,
>>> It complained during build:
>>> Downloading:
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.pom
>>> [WARNING] The POM for com.typesafe.akka:akka-actor:jar:2.0.5 is missing,
>>> no dependency information available
>>> Downloading:
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-remote/2.0.5/akka-remote-2.0.5.pom
>>> [WARNING] The POM for com.typesafe.akka:akka-remote:jar:2.0.5 is
>>> missing, no dependency information available
>>> Downloading:
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j/2.0.5/akka-slf4j-2.0.5.pom
>>> [WARNING] The POM for com.typesafe.akka:akka-slf4j:jar:2.0.5 is missing,
>>> no dependency information available
>>> Downloading:
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.jar
>>> Downloading:
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-remote/2.0.5/akka-remote-2.0.5.jar
>>> Downloading:
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j/2.0.5/akka-slf4j-2.0.5.jar
>>>
>>>
>>>
>>> then, throw Error:
>>> [ERROR] Failed to execute goal on project spark-core_2.9.3: Could not
>>> resolve dependencies for project
>>> org.apache.spark:spark-core_2.9.3:jar:0.8.1-incubating: The following
>>> artifacts could not be resolved: com.typesafe.akka:akka-actor:jar:2.0.5,
>>> com.typesafe.akka:akka-remote:jar:2.0.5,
>>> com.typesafe.akka:akka-slf4j:jar:2.0.5: Could not find artifact
>>> com.typesafe.akka:akka-actor:jar:2.0.5 in maven-nexus (
>>> http://repo1.maven.org/maven2/) -> [Help 1]
>>>
>>>
>>>
>>> I checked on the http://repo1.maven.org/maven2, there is no akka-*
>>>  2.0.5, and there is no URL such as '
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5', it's
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor_'VERSION'/
>>>
>>>
>>>
>>> On Mon, Dec 9, 2013 at 9:54 AM, Azuryy Yu  wrote:
>>>
 Thanks Mark, I will.


 On Mon, Dec 9, 2013 at 9:53 AM, Mark Hamstra 
 wrote:

> You probably want to try the current release candidate:
> https://github.com/apache/incubator-spark/archive/v0.8.1-incubating.tar.gz
>
>
> On Sun, Dec 8, 2013 at 5:51 PM, Azuryy Yu  wrote:
>
>> Thanks Matei, I'll try.
>>
>> @Mark, I download source package from
>> http://spark-project.org/download/spark-0.8.0-incubating.tgz, Sorry,
>> I build 0.8.0. not 0.8.1.
>>
>>
>> On Mon, Dec 9, 2013 at 9:31 AM, Mark Hamstra > > wrote:
>>
>>> There is no released source package of Spark 0.8.1.  It's just gone
>>> into release candidate in the past day.  Is that what you are trying to
>>> build?  It should be exactly the same as what I just checked out from 
>>> the
>>> v0.8.1-incubating tag.
>>>
>>>
>>> On Sun, Dec 8, 2013 at 5:28 PM, Azuryy Yu wrote:
>>>
 I am not check out from repository, I download source package and
 build.
  On 2013-12-09 9:22 AM, "Mark Hamstra" 
 wrote:

> I don't believe that is true of the Spark 0.8.1 code.  I just got
> done building Spark from the v0.8.1-incubating tag after first 
> removing
> anything to do with akka from my ~/.m2/repository.  After a successful
> build without incident, my local repo now only contains akka 2.0.5 
> packages
> within the com/typesafe/akka subtree.
>
>
>
> On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu wrote:
>
>> I build 0.8.1, maven try to download akka-actor-2.0.1, which is
>> used by scala-core-io.
>>  On 2013-12-09 8:40 AM, "Matei Zaharia" 
>> wrote:
>>
>>> Which version of Spark are you building? AFAIK it should be
>>> using Akka 2.0.5, not 2.0.1.
>>>
>>> Matei
>>>
>>> On Dec 8, 2013, at 3:35 PM, Azuryy Yu 
>>> wrote:
>>>
>>> any thoughs here? I still cannot compile spark using maven,
>>> thanks for any inputs.
>>>  On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:
>>>
 Hey dears,

>>>

Re: Build Spark with maven

2013-12-08 Thread Mark Hamstra
Yeah, don't do that.  The akka packages you need are not in Maven Central,
so you won't be able to find them by looking only in a mirror.  If you
temporarily rename your settings.xml file, build Spark 0.8.1, then put your
settings back, you should be okay as long as you don't remove the akka
packages from your ~/.m2/repository



On Sun, Dec 8, 2013 at 6:45 PM, Azuryy Yu  wrote:

> @Mark
>
> I configured under apache-maven/conf/settings.xml:
>
>   
> 
> 
>   maven-nexus
>   external:*
>   Official Maven Repo
>   http://repo1.maven.org/maven2/
> 
>   
>
>
>
> On Mon, Dec 9, 2013 at 10:41 AM, Azuryy Yu  wrote:
>
>> @Mater, what's your maven mirror used in your setting.xml, can you share
>> with me? Thanks.
>>
>>
>>
>> On Mon, Dec 9, 2013 at 10:14 AM, Azuryy Yu  wrote:
>>
>>> Hi Mark,
>>>
>>> I build the current releast candidate,
>>> It complained during build:
>>> Downloading:
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.pom
>>> [WARNING] The POM for com.typesafe.akka:akka-actor:jar:2.0.5 is missing,
>>> no dependency information available
>>> Downloading:
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-remote/2.0.5/akka-remote-2.0.5.pom
>>> [WARNING] The POM for com.typesafe.akka:akka-remote:jar:2.0.5 is
>>> missing, no dependency information available
>>> Downloading:
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j/2.0.5/akka-slf4j-2.0.5.pom
>>> [WARNING] The POM for com.typesafe.akka:akka-slf4j:jar:2.0.5 is missing,
>>> no dependency information available
>>> Downloading:
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.jar
>>> Downloading:
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-remote/2.0.5/akka-remote-2.0.5.jar
>>> Downloading:
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j/2.0.5/akka-slf4j-2.0.5.jar
>>>
>>>
>>>
>>> then, throw Error:
>>> [ERROR] Failed to execute goal on project spark-core_2.9.3: Could not
>>> resolve dependencies for project
>>> org.apache.spark:spark-core_2.9.3:jar:0.8.1-incubating: The following
>>> artifacts could not be resolved: com.typesafe.akka:akka-actor:jar:2.0.5,
>>> com.typesafe.akka:akka-remote:jar:2.0.5,
>>> com.typesafe.akka:akka-slf4j:jar:2.0.5: Could not find artifact
>>> com.typesafe.akka:akka-actor:jar:2.0.5 in maven-nexus (
>>> http://repo1.maven.org/maven2/) -> [Help 1]
>>>
>>>
>>>
>>> I checked on the http://repo1.maven.org/maven2, there is no akka-*
>>>  2.0.5, and there is no URL such as '
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5', it's
>>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor_'VERSION'/
>>>
>>>
>>>
>>> On Mon, Dec 9, 2013 at 9:54 AM, Azuryy Yu  wrote:
>>>
 Thanks Mark, I will.


 On Mon, Dec 9, 2013 at 9:53 AM, Mark Hamstra 
 wrote:

> You probably want to try the current release candidate:
> https://github.com/apache/incubator-spark/archive/v0.8.1-incubating.tar.gz
>
>
> On Sun, Dec 8, 2013 at 5:51 PM, Azuryy Yu  wrote:
>
>> Thanks Matei, I'll try.
>>
>> @Mark, I download source package from
>> http://spark-project.org/download/spark-0.8.0-incubating.tgz, Sorry,
>> I build 0.8.0. not 0.8.1.
>>
>>
>> On Mon, Dec 9, 2013 at 9:31 AM, Mark Hamstra > > wrote:
>>
>>> There is no released source package of Spark 0.8.1.  It's just gone
>>> into release candidate in the past day.  Is that what you are trying to
>>> build?  It should be exactly the same as what I just checked out from 
>>> the
>>> v0.8.1-incubating tag.
>>>
>>>
>>> On Sun, Dec 8, 2013 at 5:28 PM, Azuryy Yu wrote:
>>>
 I am not check out from repository, I download source package and
 build.
  On 2013-12-09 9:22 AM, "Mark Hamstra" 
 wrote:

> I don't believe that is true of the Spark 0.8.1 code.  I just got
> done building Spark from the v0.8.1-incubating tag after first 
> removing
> anything to do with akka from my ~/.m2/repository.  After a successful
> build without incident, my local repo now only contains akka 2.0.5 
> packages
> within the com/typesafe/akka subtree.
>
>
>
> On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu wrote:
>
>> I build 0.8.1, maven try to download akka-actor-2.0.1, which is
>> used by scala-core-io.
>>  On 2013-12-09 8:40 AM, "Matei Zaharia" 
>> wrote:
>>
>>> Which version of Spark are you building? AFAIK it should be
>>> using Akka 2.0.5, not 2.0.1.
>>>
>>> Matei
>>>
>>> On Dec 8, 2013, at 3:35 PM, Azuryy Yu 
>>> wrote:
>>>
>>> any thoughs here? I still cannot compile spark using maven,
>>> thanks for any inputs.
>>>  On 201

Re: Build Spark with maven

2013-12-08 Thread Azuryy Yu
@Mark

I configured under apache-maven/conf/settings.xml:

  


  maven-nexus
  external:*
  Official Maven Repo
  http://repo1.maven.org/maven2/

  



On Mon, Dec 9, 2013 at 10:41 AM, Azuryy Yu  wrote:

> @Mater, what's your maven mirror used in your setting.xml, can you share
> with me? Thanks.
>
>
>
> On Mon, Dec 9, 2013 at 10:14 AM, Azuryy Yu  wrote:
>
>> Hi Mark,
>>
>> I build the current releast candidate,
>> It complained during build:
>> Downloading:
>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.pom
>> [WARNING] The POM for com.typesafe.akka:akka-actor:jar:2.0.5 is missing,
>> no dependency information available
>> Downloading:
>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-remote/2.0.5/akka-remote-2.0.5.pom
>> [WARNING] The POM for com.typesafe.akka:akka-remote:jar:2.0.5 is missing,
>> no dependency information available
>> Downloading:
>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j/2.0.5/akka-slf4j-2.0.5.pom
>> [WARNING] The POM for com.typesafe.akka:akka-slf4j:jar:2.0.5 is missing,
>> no dependency information available
>> Downloading:
>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.jar
>> Downloading:
>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-remote/2.0.5/akka-remote-2.0.5.jar
>> Downloading:
>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j/2.0.5/akka-slf4j-2.0.5.jar
>>
>>
>>
>> then, throw Error:
>> [ERROR] Failed to execute goal on project spark-core_2.9.3: Could not
>> resolve dependencies for project
>> org.apache.spark:spark-core_2.9.3:jar:0.8.1-incubating: The following
>> artifacts could not be resolved: com.typesafe.akka:akka-actor:jar:2.0.5,
>> com.typesafe.akka:akka-remote:jar:2.0.5,
>> com.typesafe.akka:akka-slf4j:jar:2.0.5: Could not find artifact
>> com.typesafe.akka:akka-actor:jar:2.0.5 in maven-nexus (
>> http://repo1.maven.org/maven2/) -> [Help 1]
>>
>>
>>
>> I checked on the http://repo1.maven.org/maven2, there is no akka-*
>>  2.0.5, and there is no URL such as '
>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5', it's
>> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor_'VERSION'/
>>
>>
>>
>> On Mon, Dec 9, 2013 at 9:54 AM, Azuryy Yu  wrote:
>>
>>> Thanks Mark, I will.
>>>
>>>
>>> On Mon, Dec 9, 2013 at 9:53 AM, Mark Hamstra wrote:
>>>
 You probably want to try the current release candidate:
 https://github.com/apache/incubator-spark/archive/v0.8.1-incubating.tar.gz


 On Sun, Dec 8, 2013 at 5:51 PM, Azuryy Yu  wrote:

> Thanks Matei, I'll try.
>
> @Mark, I download source package from
> http://spark-project.org/download/spark-0.8.0-incubating.tgz, Sorry,
> I build 0.8.0. not 0.8.1.
>
>
> On Mon, Dec 9, 2013 at 9:31 AM, Mark Hamstra 
> wrote:
>
>> There is no released source package of Spark 0.8.1.  It's just gone
>> into release candidate in the past day.  Is that what you are trying to
>> build?  It should be exactly the same as what I just checked out from the
>> v0.8.1-incubating tag.
>>
>>
>> On Sun, Dec 8, 2013 at 5:28 PM, Azuryy Yu  wrote:
>>
>>> I am not check out from repository, I download source package and
>>> build.
>>>  On 2013-12-09 9:22 AM, "Mark Hamstra" 
>>> wrote:
>>>
 I don't believe that is true of the Spark 0.8.1 code.  I just got
 done building Spark from the v0.8.1-incubating tag after first removing
 anything to do with akka from my ~/.m2/repository.  After a successful
 build without incident, my local repo now only contains akka 2.0.5 
 packages
 within the com/typesafe/akka subtree.



 On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu wrote:

> I build 0.8.1, maven try to download akka-actor-2.0.1, which is
> used by scala-core-io.
>  On 2013-12-09 8:40 AM, "Matei Zaharia" 
> wrote:
>
>> Which version of Spark are you building? AFAIK it should be using
>> Akka 2.0.5, not 2.0.1.
>>
>> Matei
>>
>> On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:
>>
>> any thoughs here? I still cannot compile spark using maven,
>> thanks for any inputs.
>>  On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:
>>
>>> Hey dears,
>>>
>>> Can you give me a maven repo, so I can compile Spark with Maven.
>>>
>>> I'm using http://repo1.maven.org/maven2/ currently
>>>
>>> but It complains cannot find akka-actor-2.0.1,  I searched on
>>> the repo1.maven, and I am also cannot find akka-actor-2.0.1, which 
>>> is too
>>> old.
>>>
>>> another strange output I can see:
>>> 2.9.3  in the pom,
>>> but Maven download scala-

Re: Build Spark with maven

2013-12-08 Thread Azuryy Yu
@Mater, what's your maven mirror used in your setting.xml, can you share
with me? Thanks.



On Mon, Dec 9, 2013 at 10:14 AM, Azuryy Yu  wrote:

> Hi Mark,
>
> I build the current releast candidate,
> It complained during build:
> Downloading:
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.pom
> [WARNING] The POM for com.typesafe.akka:akka-actor:jar:2.0.5 is missing,
> no dependency information available
> Downloading:
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-remote/2.0.5/akka-remote-2.0.5.pom
> [WARNING] The POM for com.typesafe.akka:akka-remote:jar:2.0.5 is missing,
> no dependency information available
> Downloading:
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j/2.0.5/akka-slf4j-2.0.5.pom
> [WARNING] The POM for com.typesafe.akka:akka-slf4j:jar:2.0.5 is missing,
> no dependency information available
> Downloading:
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.jar
> Downloading:
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-remote/2.0.5/akka-remote-2.0.5.jar
> Downloading:
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j/2.0.5/akka-slf4j-2.0.5.jar
>
>
>
> then, throw Error:
> [ERROR] Failed to execute goal on project spark-core_2.9.3: Could not
> resolve dependencies for project
> org.apache.spark:spark-core_2.9.3:jar:0.8.1-incubating: The following
> artifacts could not be resolved: com.typesafe.akka:akka-actor:jar:2.0.5,
> com.typesafe.akka:akka-remote:jar:2.0.5,
> com.typesafe.akka:akka-slf4j:jar:2.0.5: Could not find artifact
> com.typesafe.akka:akka-actor:jar:2.0.5 in maven-nexus (
> http://repo1.maven.org/maven2/) -> [Help 1]
>
>
>
> I checked on the http://repo1.maven.org/maven2, there is no akka-*
>  2.0.5, and there is no URL such as '
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5', it's
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor_'VERSION'/
>
>
>
> On Mon, Dec 9, 2013 at 9:54 AM, Azuryy Yu  wrote:
>
>> Thanks Mark, I will.
>>
>>
>> On Mon, Dec 9, 2013 at 9:53 AM, Mark Hamstra wrote:
>>
>>> You probably want to try the current release candidate:
>>> https://github.com/apache/incubator-spark/archive/v0.8.1-incubating.tar.gz
>>>
>>>
>>> On Sun, Dec 8, 2013 at 5:51 PM, Azuryy Yu  wrote:
>>>
 Thanks Matei, I'll try.

 @Mark, I download source package from
 http://spark-project.org/download/spark-0.8.0-incubating.tgz, Sorry, I
 build 0.8.0. not 0.8.1.


 On Mon, Dec 9, 2013 at 9:31 AM, Mark Hamstra 
 wrote:

> There is no released source package of Spark 0.8.1.  It's just gone
> into release candidate in the past day.  Is that what you are trying to
> build?  It should be exactly the same as what I just checked out from the
> v0.8.1-incubating tag.
>
>
> On Sun, Dec 8, 2013 at 5:28 PM, Azuryy Yu  wrote:
>
>> I am not check out from repository, I download source package and
>> build.
>>  On 2013-12-09 9:22 AM, "Mark Hamstra" 
>> wrote:
>>
>>> I don't believe that is true of the Spark 0.8.1 code.  I just got
>>> done building Spark from the v0.8.1-incubating tag after first removing
>>> anything to do with akka from my ~/.m2/repository.  After a successful
>>> build without incident, my local repo now only contains akka 2.0.5 
>>> packages
>>> within the com/typesafe/akka subtree.
>>>
>>>
>>>
>>> On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu wrote:
>>>
 I build 0.8.1, maven try to download akka-actor-2.0.1, which is
 used by scala-core-io.
  On 2013-12-09 8:40 AM, "Matei Zaharia" 
 wrote:

> Which version of Spark are you building? AFAIK it should be using
> Akka 2.0.5, not 2.0.1.
>
> Matei
>
> On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:
>
> any thoughs here? I still cannot compile spark using maven, thanks
> for any inputs.
>  On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:
>
>> Hey dears,
>>
>> Can you give me a maven repo, so I can compile Spark with Maven.
>>
>> I'm using http://repo1.maven.org/maven2/ currently
>>
>> but It complains cannot find akka-actor-2.0.1,  I searched on the
>> repo1.maven, and I am also cannot find akka-actor-2.0.1, which is 
>> too old.
>>
>> another strange output I can see:
>> 2.9.3  in the pom,
>> but Maven download scala-2.9.2 during compile, why is that?
>>
>>
>> Thanks.
>>
>
>
>>>
>

>>>
>>
>


Re: Build Spark with maven

2013-12-08 Thread Mark Hamstra
Maven should retrieve the akka libraries from http://repo.akka.io/releases/
-- see the  section of spark/pom.xml.  Do you have nexus
settings in, e.g., a ~/.m2/settings.xml file that are interfering with
fetching form that repository?


On Sun, Dec 8, 2013 at 6:14 PM, Azuryy Yu  wrote:

> Hi Mark,
>
> I build the current releast candidate,
> It complained during build:
> Downloading:
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.pom
> [WARNING] The POM for com.typesafe.akka:akka-actor:jar:2.0.5 is missing,
> no dependency information available
> Downloading:
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-remote/2.0.5/akka-remote-2.0.5.pom
> [WARNING] The POM for com.typesafe.akka:akka-remote:jar:2.0.5 is missing,
> no dependency information available
> Downloading:
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j/2.0.5/akka-slf4j-2.0.5.pom
> [WARNING] The POM for com.typesafe.akka:akka-slf4j:jar:2.0.5 is missing,
> no dependency information available
> Downloading:
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.jar
> Downloading:
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-remote/2.0.5/akka-remote-2.0.5.jar
> Downloading:
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j/2.0.5/akka-slf4j-2.0.5.jar
>
>
>
> then, throw Error:
> [ERROR] Failed to execute goal on project spark-core_2.9.3: Could not
> resolve dependencies for project
> org.apache.spark:spark-core_2.9.3:jar:0.8.1-incubating: The following
> artifacts could not be resolved: com.typesafe.akka:akka-actor:jar:2.0.5,
> com.typesafe.akka:akka-remote:jar:2.0.5,
> com.typesafe.akka:akka-slf4j:jar:2.0.5: Could not find artifact
> com.typesafe.akka:akka-actor:jar:2.0.5 in maven-nexus (
> http://repo1.maven.org/maven2/) -> [Help 1]
>
>
>
> I checked on the http://repo1.maven.org/maven2, there is no akka-*
>  2.0.5, and there is no URL such as '
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5', it's
> http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor_'VERSION'/
>
>
>
> On Mon, Dec 9, 2013 at 9:54 AM, Azuryy Yu  wrote:
>
>> Thanks Mark, I will.
>>
>>
>> On Mon, Dec 9, 2013 at 9:53 AM, Mark Hamstra wrote:
>>
>>> You probably want to try the current release candidate:
>>> https://github.com/apache/incubator-spark/archive/v0.8.1-incubating.tar.gz
>>>
>>>
>>> On Sun, Dec 8, 2013 at 5:51 PM, Azuryy Yu  wrote:
>>>
 Thanks Matei, I'll try.

 @Mark, I download source package from
 http://spark-project.org/download/spark-0.8.0-incubating.tgz, Sorry, I
 build 0.8.0. not 0.8.1.


 On Mon, Dec 9, 2013 at 9:31 AM, Mark Hamstra 
 wrote:

> There is no released source package of Spark 0.8.1.  It's just gone
> into release candidate in the past day.  Is that what you are trying to
> build?  It should be exactly the same as what I just checked out from the
> v0.8.1-incubating tag.
>
>
> On Sun, Dec 8, 2013 at 5:28 PM, Azuryy Yu  wrote:
>
>> I am not check out from repository, I download source package and
>> build.
>>  On 2013-12-09 9:22 AM, "Mark Hamstra" 
>> wrote:
>>
>>> I don't believe that is true of the Spark 0.8.1 code.  I just got
>>> done building Spark from the v0.8.1-incubating tag after first removing
>>> anything to do with akka from my ~/.m2/repository.  After a successful
>>> build without incident, my local repo now only contains akka 2.0.5 
>>> packages
>>> within the com/typesafe/akka subtree.
>>>
>>>
>>>
>>> On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu wrote:
>>>
 I build 0.8.1, maven try to download akka-actor-2.0.1, which is
 used by scala-core-io.
  On 2013-12-09 8:40 AM, "Matei Zaharia" 
 wrote:

> Which version of Spark are you building? AFAIK it should be using
> Akka 2.0.5, not 2.0.1.
>
> Matei
>
> On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:
>
> any thoughs here? I still cannot compile spark using maven, thanks
> for any inputs.
>  On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:
>
>> Hey dears,
>>
>> Can you give me a maven repo, so I can compile Spark with Maven.
>>
>> I'm using http://repo1.maven.org/maven2/ currently
>>
>> but It complains cannot find akka-actor-2.0.1,  I searched on the
>> repo1.maven, and I am also cannot find akka-actor-2.0.1, which is 
>> too old.
>>
>> another strange output I can see:
>> 2.9.3  in the pom,
>> but Maven download scala-2.9.2 during compile, why is that?
>>
>>
>> Thanks.
>>
>
>
>>>
>

>>>
>>
>


Re: Build Spark with maven

2013-12-08 Thread Azuryy Yu
Hi Mark,

I build the current releast candidate,
It complained during build:
Downloading:
http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.pom
[WARNING] The POM for com.typesafe.akka:akka-actor:jar:2.0.5 is missing, no
dependency information available
Downloading:
http://repo1.maven.org/maven2/com/typesafe/akka/akka-remote/2.0.5/akka-remote-2.0.5.pom
[WARNING] The POM for com.typesafe.akka:akka-remote:jar:2.0.5 is missing,
no dependency information available
Downloading:
http://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j/2.0.5/akka-slf4j-2.0.5.pom
[WARNING] The POM for com.typesafe.akka:akka-slf4j:jar:2.0.5 is missing, no
dependency information available
Downloading:
http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.jar
Downloading:
http://repo1.maven.org/maven2/com/typesafe/akka/akka-remote/2.0.5/akka-remote-2.0.5.jar
Downloading:
http://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j/2.0.5/akka-slf4j-2.0.5.jar



then, throw Error:
[ERROR] Failed to execute goal on project spark-core_2.9.3: Could not
resolve dependencies for project
org.apache.spark:spark-core_2.9.3:jar:0.8.1-incubating: The following
artifacts could not be resolved: com.typesafe.akka:akka-actor:jar:2.0.5,
com.typesafe.akka:akka-remote:jar:2.0.5,
com.typesafe.akka:akka-slf4j:jar:2.0.5: Could not find artifact
com.typesafe.akka:akka-actor:jar:2.0.5 in maven-nexus (
http://repo1.maven.org/maven2/) -> [Help 1]



I checked on the http://repo1.maven.org/maven2, there is no akka-*  2.0.5,
and there is no URL such as '
http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5', it's
http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor_'VERSION'/



On Mon, Dec 9, 2013 at 9:54 AM, Azuryy Yu  wrote:

> Thanks Mark, I will.
>
>
> On Mon, Dec 9, 2013 at 9:53 AM, Mark Hamstra wrote:
>
>> You probably want to try the current release candidate:
>> https://github.com/apache/incubator-spark/archive/v0.8.1-incubating.tar.gz
>>
>>
>> On Sun, Dec 8, 2013 at 5:51 PM, Azuryy Yu  wrote:
>>
>>> Thanks Matei, I'll try.
>>>
>>> @Mark, I download source package from
>>> http://spark-project.org/download/spark-0.8.0-incubating.tgz, Sorry, I
>>> build 0.8.0. not 0.8.1.
>>>
>>>
>>> On Mon, Dec 9, 2013 at 9:31 AM, Mark Hamstra wrote:
>>>
 There is no released source package of Spark 0.8.1.  It's just gone
 into release candidate in the past day.  Is that what you are trying to
 build?  It should be exactly the same as what I just checked out from the
 v0.8.1-incubating tag.


 On Sun, Dec 8, 2013 at 5:28 PM, Azuryy Yu  wrote:

> I am not check out from repository, I download source package and
> build.
>  On 2013-12-09 9:22 AM, "Mark Hamstra" 
> wrote:
>
>> I don't believe that is true of the Spark 0.8.1 code.  I just got
>> done building Spark from the v0.8.1-incubating tag after first removing
>> anything to do with akka from my ~/.m2/repository.  After a successful
>> build without incident, my local repo now only contains akka 2.0.5 
>> packages
>> within the com/typesafe/akka subtree.
>>
>>
>>
>> On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu  wrote:
>>
>>> I build 0.8.1, maven try to download akka-actor-2.0.1, which is used
>>> by scala-core-io.
>>>  On 2013-12-09 8:40 AM, "Matei Zaharia" 
>>> wrote:
>>>
 Which version of Spark are you building? AFAIK it should be using
 Akka 2.0.5, not 2.0.1.

 Matei

 On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:

 any thoughs here? I still cannot compile spark using maven, thanks
 for any inputs.
  On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:

> Hey dears,
>
> Can you give me a maven repo, so I can compile Spark with Maven.
>
> I'm using http://repo1.maven.org/maven2/ currently
>
> but It complains cannot find akka-actor-2.0.1,  I searched on the
> repo1.maven, and I am also cannot find akka-actor-2.0.1, which is too 
> old.
>
> another strange output I can see:
> 2.9.3  in the pom,
> but Maven download scala-2.9.2 during compile, why is that?
>
>
> Thanks.
>


>>

>>>
>>
>


Re: Build Spark with maven

2013-12-08 Thread Azuryy Yu
Thanks Mark, I will.


On Mon, Dec 9, 2013 at 9:53 AM, Mark Hamstra wrote:

> You probably want to try the current release candidate:
> https://github.com/apache/incubator-spark/archive/v0.8.1-incubating.tar.gz
>
>
> On Sun, Dec 8, 2013 at 5:51 PM, Azuryy Yu  wrote:
>
>> Thanks Matei, I'll try.
>>
>> @Mark, I download source package from
>> http://spark-project.org/download/spark-0.8.0-incubating.tgz, Sorry, I
>> build 0.8.0. not 0.8.1.
>>
>>
>> On Mon, Dec 9, 2013 at 9:31 AM, Mark Hamstra wrote:
>>
>>> There is no released source package of Spark 0.8.1.  It's just gone into
>>> release candidate in the past day.  Is that what you are trying to build?
>>>  It should be exactly the same as what I just checked out from the
>>> v0.8.1-incubating tag.
>>>
>>>
>>> On Sun, Dec 8, 2013 at 5:28 PM, Azuryy Yu  wrote:
>>>
 I am not check out from repository, I download source package and build.
  On 2013-12-09 9:22 AM, "Mark Hamstra"  wrote:

> I don't believe that is true of the Spark 0.8.1 code.  I just got done
> building Spark from the v0.8.1-incubating tag after first removing 
> anything
> to do with akka from my ~/.m2/repository.  After a successful build 
> without
> incident, my local repo now only contains akka 2.0.5 packages within the
> com/typesafe/akka subtree.
>
>
>
> On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu  wrote:
>
>> I build 0.8.1, maven try to download akka-actor-2.0.1, which is used
>> by scala-core-io.
>>  On 2013-12-09 8:40 AM, "Matei Zaharia" 
>> wrote:
>>
>>> Which version of Spark are you building? AFAIK it should be using
>>> Akka 2.0.5, not 2.0.1.
>>>
>>> Matei
>>>
>>> On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:
>>>
>>> any thoughs here? I still cannot compile spark using maven, thanks
>>> for any inputs.
>>>  On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:
>>>
 Hey dears,

 Can you give me a maven repo, so I can compile Spark with Maven.

 I'm using http://repo1.maven.org/maven2/ currently

 but It complains cannot find akka-actor-2.0.1,  I searched on the
 repo1.maven, and I am also cannot find akka-actor-2.0.1, which is too 
 old.

 another strange output I can see:
 2.9.3  in the pom,
 but Maven download scala-2.9.2 during compile, why is that?


 Thanks.

>>>
>>>
>
>>>
>>
>


Re: Build Spark with maven

2013-12-08 Thread Mark Hamstra
You probably want to try the current release candidate:
https://github.com/apache/incubator-spark/archive/v0.8.1-incubating.tar.gz


On Sun, Dec 8, 2013 at 5:51 PM, Azuryy Yu  wrote:

> Thanks Matei, I'll try.
>
> @Mark, I download source package from
> http://spark-project.org/download/spark-0.8.0-incubating.tgz, Sorry, I
> build 0.8.0. not 0.8.1.
>
>
> On Mon, Dec 9, 2013 at 9:31 AM, Mark Hamstra wrote:
>
>> There is no released source package of Spark 0.8.1.  It's just gone into
>> release candidate in the past day.  Is that what you are trying to build?
>>  It should be exactly the same as what I just checked out from the
>> v0.8.1-incubating tag.
>>
>>
>> On Sun, Dec 8, 2013 at 5:28 PM, Azuryy Yu  wrote:
>>
>>> I am not check out from repository, I download source package and build.
>>>  On 2013-12-09 9:22 AM, "Mark Hamstra"  wrote:
>>>
 I don't believe that is true of the Spark 0.8.1 code.  I just got done
 building Spark from the v0.8.1-incubating tag after first removing anything
 to do with akka from my ~/.m2/repository.  After a successful build without
 incident, my local repo now only contains akka 2.0.5 packages within the
 com/typesafe/akka subtree.



 On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu  wrote:

> I build 0.8.1, maven try to download akka-actor-2.0.1, which is used
> by scala-core-io.
>  On 2013-12-09 8:40 AM, "Matei Zaharia" 
> wrote:
>
>> Which version of Spark are you building? AFAIK it should be using
>> Akka 2.0.5, not 2.0.1.
>>
>> Matei
>>
>> On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:
>>
>> any thoughs here? I still cannot compile spark using maven, thanks
>> for any inputs.
>>  On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:
>>
>>> Hey dears,
>>>
>>> Can you give me a maven repo, so I can compile Spark with Maven.
>>>
>>> I'm using http://repo1.maven.org/maven2/ currently
>>>
>>> but It complains cannot find akka-actor-2.0.1,  I searched on the
>>> repo1.maven, and I am also cannot find akka-actor-2.0.1, which is too 
>>> old.
>>>
>>> another strange output I can see:
>>> 2.9.3  in the pom,
>>> but Maven download scala-2.9.2 during compile, why is that?
>>>
>>>
>>> Thanks.
>>>
>>
>>

>>
>


Re: Build Spark with maven

2013-12-08 Thread Azuryy Yu
Thanks Matei, I'll try.

@Mark, I download source package from
http://spark-project.org/download/spark-0.8.0-incubating.tgz, Sorry, I
build 0.8.0. not 0.8.1.


On Mon, Dec 9, 2013 at 9:31 AM, Mark Hamstra wrote:

> There is no released source package of Spark 0.8.1.  It's just gone into
> release candidate in the past day.  Is that what you are trying to build?
>  It should be exactly the same as what I just checked out from the
> v0.8.1-incubating tag.
>
>
> On Sun, Dec 8, 2013 at 5:28 PM, Azuryy Yu  wrote:
>
>> I am not check out from repository, I download source package and build.
>>  On 2013-12-09 9:22 AM, "Mark Hamstra"  wrote:
>>
>>> I don't believe that is true of the Spark 0.8.1 code.  I just got done
>>> building Spark from the v0.8.1-incubating tag after first removing anything
>>> to do with akka from my ~/.m2/repository.  After a successful build without
>>> incident, my local repo now only contains akka 2.0.5 packages within the
>>> com/typesafe/akka subtree.
>>>
>>>
>>>
>>> On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu  wrote:
>>>
 I build 0.8.1, maven try to download akka-actor-2.0.1, which is used by
 scala-core-io.
  On 2013-12-09 8:40 AM, "Matei Zaharia" 
 wrote:

> Which version of Spark are you building? AFAIK it should be using Akka
> 2.0.5, not 2.0.1.
>
> Matei
>
> On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:
>
> any thoughs here? I still cannot compile spark using maven, thanks for
> any inputs.
>  On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:
>
>> Hey dears,
>>
>> Can you give me a maven repo, so I can compile Spark with Maven.
>>
>> I'm using http://repo1.maven.org/maven2/ currently
>>
>> but It complains cannot find akka-actor-2.0.1,  I searched on the
>> repo1.maven, and I am also cannot find akka-actor-2.0.1, which is too 
>> old.
>>
>> another strange output I can see:
>> 2.9.3  in the pom,
>> but Maven download scala-2.9.2 during compile, why is that?
>>
>>
>> Thanks.
>>
>
>
>>>
>


Re: Bump: on disk storage formats

2013-12-08 Thread Patrick Wendell
This is a very open ended question so it's hard to give a specific
answer... it depends a lot on whether disk IO is a bottleneck in your
workload and whether you tend to analyze all of each record or only
certain fields. If you are doing disk IO a lot and only touching a few
fields something like Parquet might help, or (simpler) just creating
smaller projections of your data with only the fields you care about.
Tab delimited formats can have less serialization overhead than JSON,
so flattening the data might also help. It really depends on your
access patterns and data types.

In many cases with Spark another important question is how the user
stores the data in-memory, not the on-disk format. It does depend how
they are using Spark though.

- Patrick

On Sun, Dec 8, 2013 at 3:03 PM, Andrew Ash  wrote:
> LZO compression at a minimum, and using Parquet as a second step,
> seems like the way to go though I haven't tried either personally yet.
>
> Sent from my mobile phone
>
> On Dec 8, 2013, at 16:54, Ankur Chauhan  wrote:
>
>> Hi all,
>>
>> Sorry for posting this again but I am interested in finding out what 
>> different on disk data formats for storing timeline event and analytics 
>> aggregate data.
>>
>> Currently I am just using newline delimited json gzipped files. I was 
>> wondering if there were any recommendations.
>>
>> -- Ankur


Re: Build Spark with maven

2013-12-08 Thread Mark Hamstra
There is no released source package of Spark 0.8.1.  It's just gone into
release candidate in the past day.  Is that what you are trying to build?
 It should be exactly the same as what I just checked out from the
v0.8.1-incubating tag.


On Sun, Dec 8, 2013 at 5:28 PM, Azuryy Yu  wrote:

> I am not check out from repository, I download source package and build.
>  On 2013-12-09 9:22 AM, "Mark Hamstra"  wrote:
>
>> I don't believe that is true of the Spark 0.8.1 code.  I just got done
>> building Spark from the v0.8.1-incubating tag after first removing anything
>> to do with akka from my ~/.m2/repository.  After a successful build without
>> incident, my local repo now only contains akka 2.0.5 packages within the
>> com/typesafe/akka subtree.
>>
>>
>>
>> On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu  wrote:
>>
>>> I build 0.8.1, maven try to download akka-actor-2.0.1, which is used by
>>> scala-core-io.
>>>  On 2013-12-09 8:40 AM, "Matei Zaharia"  wrote:
>>>
 Which version of Spark are you building? AFAIK it should be using Akka
 2.0.5, not 2.0.1.

 Matei

 On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:

 any thoughs here? I still cannot compile spark using maven, thanks for
 any inputs.
  On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:

> Hey dears,
>
> Can you give me a maven repo, so I can compile Spark with Maven.
>
> I'm using http://repo1.maven.org/maven2/ currently
>
> but It complains cannot find akka-actor-2.0.1,  I searched on the
> repo1.maven, and I am also cannot find akka-actor-2.0.1, which is too old.
>
> another strange output I can see:
> 2.9.3  in the pom,
> but Maven download scala-2.9.2 during compile, why is that?
>
>
> Thanks.
>


>>


Re: Build Spark with maven

2013-12-08 Thread Azuryy Yu
I am not check out from repository, I download source package and build.
 On 2013-12-09 9:22 AM, "Mark Hamstra"  wrote:

> I don't believe that is true of the Spark 0.8.1 code.  I just got done
> building Spark from the v0.8.1-incubating tag after first removing anything
> to do with akka from my ~/.m2/repository.  After a successful build without
> incident, my local repo now only contains akka 2.0.5 packages within the
> com/typesafe/akka subtree.
>
>
>
> On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu  wrote:
>
>> I build 0.8.1, maven try to download akka-actor-2.0.1, which is used by
>> scala-core-io.
>>  On 2013-12-09 8:40 AM, "Matei Zaharia"  wrote:
>>
>>> Which version of Spark are you building? AFAIK it should be using Akka
>>> 2.0.5, not 2.0.1.
>>>
>>> Matei
>>>
>>> On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:
>>>
>>> any thoughs here? I still cannot compile spark using maven, thanks for
>>> any inputs.
>>>  On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:
>>>
 Hey dears,

 Can you give me a maven repo, so I can compile Spark with Maven.

 I'm using http://repo1.maven.org/maven2/ currently

 but It complains cannot find akka-actor-2.0.1,  I searched on the
 repo1.maven, and I am also cannot find akka-actor-2.0.1, which is too old.

 another strange output I can see:
 2.9.3  in the pom,
 but Maven download scala-2.9.2 during compile, why is that?


 Thanks.

>>>
>>>
>


Re: Build Spark with maven

2013-12-08 Thread Matei Zaharia
Yeah, maybe you have weird versions of something published locally. Try 
deleting your ~/.m2 and ~/.ivy2 directories and redoing the build. 
Unfortunately this will take a while to re-download stuff, but it should work 
out.

Matei

On Dec 8, 2013, at 5:21 PM, Mark Hamstra  wrote:

> I don't believe that is true of the Spark 0.8.1 code.  I just got done 
> building Spark from the v0.8.1-incubating tag after first removing anything 
> to do with akka from my ~/.m2/repository.  After a successful build without 
> incident, my local repo now only contains akka 2.0.5 packages within the 
> com/typesafe/akka subtree.
> 
> 
> 
> On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu  wrote:
> I build 0.8.1, maven try to download akka-actor-2.0.1, which is used by 
> scala-core-io.
> On 2013-12-09 8:40 AM, "Matei Zaharia"  wrote:
> Which version of Spark are you building? AFAIK it should be using Akka 2.0.5, 
> not 2.0.1.
> 
> Matei
> 
> On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:
> 
>> any thoughs here? I still cannot compile spark using maven, thanks for any 
>> inputs.
>> On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:
>> Hey dears,
>> 
>> Can you give me a maven repo, so I can compile Spark with Maven.
>> 
>> I'm using http://repo1.maven.org/maven2/ currently
>> 
>> but It complains cannot find akka-actor-2.0.1,  I searched on the 
>> repo1.maven, and I am also cannot find akka-actor-2.0.1, which is too old.
>> 
>> another strange output I can see:
>> 2.9.3  in the pom, 
>> but Maven download scala-2.9.2 during compile, why is that?
>> 
>> 
>> Thanks.
> 
> 



Re: Build Spark with maven

2013-12-08 Thread Mark Hamstra
I don't believe that is true of the Spark 0.8.1 code.  I just got done
building Spark from the v0.8.1-incubating tag after first removing anything
to do with akka from my ~/.m2/repository.  After a successful build without
incident, my local repo now only contains akka 2.0.5 packages within the
com/typesafe/akka subtree.



On Sun, Dec 8, 2013 at 5:01 PM, Azuryy Yu  wrote:

> I build 0.8.1, maven try to download akka-actor-2.0.1, which is used by
> scala-core-io.
>  On 2013-12-09 8:40 AM, "Matei Zaharia"  wrote:
>
>> Which version of Spark are you building? AFAIK it should be using Akka
>> 2.0.5, not 2.0.1.
>>
>> Matei
>>
>> On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:
>>
>> any thoughs here? I still cannot compile spark using maven, thanks for
>> any inputs.
>>  On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:
>>
>>> Hey dears,
>>>
>>> Can you give me a maven repo, so I can compile Spark with Maven.
>>>
>>> I'm using http://repo1.maven.org/maven2/ currently
>>>
>>> but It complains cannot find akka-actor-2.0.1,  I searched on the
>>> repo1.maven, and I am also cannot find akka-actor-2.0.1, which is too old.
>>>
>>> another strange output I can see:
>>> 2.9.3  in the pom,
>>> but Maven download scala-2.9.2 during compile, why is that?
>>>
>>>
>>> Thanks.
>>>
>>
>>


Re: Build Spark with maven

2013-12-08 Thread Azuryy Yu
I build 0.8.1, maven try to download akka-actor-2.0.1, which is used by
scala-core-io.
 On 2013-12-09 8:40 AM, "Matei Zaharia"  wrote:

> Which version of Spark are you building? AFAIK it should be using Akka
> 2.0.5, not 2.0.1.
>
> Matei
>
> On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:
>
> any thoughs here? I still cannot compile spark using maven, thanks for any
> inputs.
>  On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:
>
>> Hey dears,
>>
>> Can you give me a maven repo, so I can compile Spark with Maven.
>>
>> I'm using http://repo1.maven.org/maven2/ currently
>>
>> but It complains cannot find akka-actor-2.0.1,  I searched on the
>> repo1.maven, and I am also cannot find akka-actor-2.0.1, which is too old.
>>
>> another strange output I can see:
>> 2.9.3  in the pom,
>> but Maven download scala-2.9.2 during compile, why is that?
>>
>>
>> Thanks.
>>
>
>


Re: Build Spark with maven

2013-12-08 Thread Matei Zaharia
Which version of Spark are you building? AFAIK it should be using Akka 2.0.5, 
not 2.0.1.

Matei

On Dec 8, 2013, at 3:35 PM, Azuryy Yu  wrote:

> any thoughs here? I still cannot compile spark using maven, thanks for any 
> inputs.
> On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:
> Hey dears,
> 
> Can you give me a maven repo, so I can compile Spark with Maven.
> 
> I'm using http://repo1.maven.org/maven2/ currently
> 
> but It complains cannot find akka-actor-2.0.1,  I searched on the 
> repo1.maven, and I am also cannot find akka-actor-2.0.1, which is too old.
> 
> another strange output I can see:
> 2.9.3  in the pom, 
> but Maven download scala-2.9.2 during compile, why is that?
> 
> 
> Thanks.



Re: Build Spark with maven

2013-12-08 Thread Azuryy Yu
any thoughs here? I still cannot compile spark using maven, thanks for any
inputs.
 On 2013-12-07 2:31 PM, "Azuryy Yu"  wrote:

> Hey dears,
>
> Can you give me a maven repo, so I can compile Spark with Maven.
>
> I'm using http://repo1.maven.org/maven2/ currently
>
> but It complains cannot find akka-actor-2.0.1,  I searched on the
> repo1.maven, and I am also cannot find akka-actor-2.0.1, which is too old.
>
> another strange output I can see:
> 2.9.3  in the pom,
> but Maven download scala-2.9.2 during compile, why is that?
>
>
> Thanks.
>


Re: Bump: on disk storage formats

2013-12-08 Thread Andrew Ash
LZO compression at a minimum, and using Parquet as a second step,
seems like the way to go though I haven't tried either personally yet.

Sent from my mobile phone

On Dec 8, 2013, at 16:54, Ankur Chauhan  wrote:

> Hi all,
>
> Sorry for posting this again but I am interested in finding out what 
> different on disk data formats for storing timeline event and analytics 
> aggregate data.
>
> Currently I am just using newline delimited json gzipped files. I was 
> wondering if there were any recommendations.
>
> -- Ankur


Bump: on disk storage formats

2013-12-08 Thread Ankur Chauhan
Hi all,

Sorry for posting this again but I am interested in finding out what different 
on disk data formats for storing timeline event and analytics aggregate data. 

Currently I am just using newline delimited json gzipped files. I was wondering 
if there were any recommendations. 

-- Ankur 

Re: Biggest spark.akka.framesize possible

2013-12-08 Thread Shangyu Luo
OK. It is clear.
But what about collect() and collectAsMap()? Is it possible that Spark
throws 'java heap space' error or 'communication error' because of a small
spark.akka.framesize? Currently I set it as 1024.
Thank you!

Best,
Shangyu


2013/12/8 Matei Zaharia 

> As I said, it should not affect performance of transformations on RDDs,
> only of sending tasks to the workers and getting results back. In general,
> you want the Akka frame size to be as small as possible while still holding
> your largest task or result; as long as your application isn’t throwing an
> error due to the frame size being too small, you’re fine. Having a bigger
> frame size will result in wasted space and unneeded memory allocation for
> buffers. It doesn’t make the communication more efficient.
>
> Matei
>
>
> On Dec 8, 2013, at 12:57 PM, Shangyu Luo  wrote:
>
> I would like to know the maximum value for spark.akka.framesize, too and I
> am wondering if it will affect the performance of reduceByKey().
> Thanks!
>
>
> 2013/12/8 Matei Zaharia 
>
>> Hey Matt,
>>
>> This setting shouldn’t really affect groupBy operations, because they
>> don’t go through Akka. The frame size setting is for messages from the
>> master to workers (specifically, sending out tasks), and for results that
>> go directly from workers to the application (e.g. collect()). So it
>> shouldn’t be a problem unless these are large. In Spark 0.8.1, results back
>> to the master will be sent in a different way if they’re large, so the
>> setting will only cover task sizes.
>>
>> Matei
>>
>> On Dec 7, 2013, at 10:20 PM, Matt Cheah  wrote:
>>
>>  Hi everyone,
>>
>>  I'm noticing like others that group-By operations with large sized
>> groups gives Spark some trouble. Increasing the spark.akka.frameSize
>> property alleviates it up to a point.
>>
>>  I was wondering what the maximum setting for this value is. I've seen
>> previous e-mails talking about the ramifications of turning up this value,
>> but I was wondering what the actual maximum number that could be set for it
>> is. I'll benchmark the performance hit accordingly.
>>
>>  Thanks!
>>
>>  -Matt Cheah
>>
>>
>>
>
>
> --
> --
>
> Shangyu, Luo
>
>
>


-- 
--

Shangyu, Luo


Re: Biggest spark.akka.framesize possible

2013-12-08 Thread Matei Zaharia
As I said, it should not affect performance of transformations on RDDs, only of 
sending tasks to the workers and getting results back. In general, you want the 
Akka frame size to be as small as possible while still holding your largest 
task or result; as long as your application isn’t throwing an error due to the 
frame size being too small, you’re fine. Having a bigger frame size will result 
in wasted space and unneeded memory allocation for buffers. It doesn’t make the 
communication more efficient.

Matei

On Dec 8, 2013, at 12:57 PM, Shangyu Luo  wrote:

> I would like to know the maximum value for spark.akka.framesize, too and I am 
> wondering if it will affect the performance of reduceByKey().
> Thanks!
> 
> 
> 2013/12/8 Matei Zaharia 
> Hey Matt,
> 
> This setting shouldn’t really affect groupBy operations, because they don’t 
> go through Akka. The frame size setting is for messages from the master to 
> workers (specifically, sending out tasks), and for results that go directly 
> from workers to the application (e.g. collect()). So it shouldn’t be a 
> problem unless these are large. In Spark 0.8.1, results back to the master 
> will be sent in a different way if they’re large, so the setting will only 
> cover task sizes.
> 
> Matei
> 
> On Dec 7, 2013, at 10:20 PM, Matt Cheah  wrote:
> 
>> Hi everyone,
>> 
>> I'm noticing like others that group-By operations with large sized groups 
>> gives Spark some trouble. Increasing the spark.akka.frameSize property 
>> alleviates it up to a point.
>> 
>> I was wondering what the maximum setting for this value is. I've seen 
>> previous e-mails talking about the ramifications of turning up this value, 
>> but I was wondering what the actual maximum number that could be set for it 
>> is. I'll benchmark the performance hit accordingly.
>> 
>> Thanks!
>> 
>> -Matt Cheah
> 
> 
> 
> 
> -- 
> --
> 
> Shangyu, Luo
> 



Re: Newbie questions

2013-12-08 Thread Matei Zaharia
Hi Kenneth,

> 1.   Is Spark suited for online learning algorithms? From what I’ve read 
> so far (mainly from this slide), it seems not but I could be wrong.

You can probably use Spark Streaming 
(http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html)
 to implement online algorithms. I know at least one group that implemented an 
online version of K-means this way. Spark also comes with a machine learning 
library that currently only has static versions (not for streaming data).

> 2.   This is a Scala/JVM question. How easy is it to interop with native 
> code (C++)? For me, it’s important to be able to use MKL, CUDA, and write 
> custom C++ code to utilize SIMD instructions.
> (It’s hard to talk about distributed computing when we haven’t optimized at 
> the single machine level.)

Java provides widely used facilities to talk to native code through the Java 
Native Interface (JNI), and wrappers around some common libraries already 
exist. For example, JBLAS (http://mikiobraun.github.io/jblas/) is a wrapper 
around BLAS, JavaCL (https://code.google.com/p/javacl/) covers OpenCL, and 
Intel has some examples on MKL: 
http://software.intel.com/sites/products/documentation/hpc/mkl/mkl_userguide_lnx/GUID-15EA8C86-7F31-4209-AD45-0D4E903F5445.htm.
 We use JBLAS in Spark.

Matei



Re: Biggest spark.akka.framesize possible

2013-12-08 Thread Shangyu Luo
I would like to know the maximum value for spark.akka.framesize, too and I
am wondering if it will affect the performance of reduceByKey().
Thanks!


2013/12/8 Matei Zaharia 

> Hey Matt,
>
> This setting shouldn’t really affect groupBy operations, because they
> don’t go through Akka. The frame size setting is for messages from the
> master to workers (specifically, sending out tasks), and for results that
> go directly from workers to the application (e.g. collect()). So it
> shouldn’t be a problem unless these are large. In Spark 0.8.1, results back
> to the master will be sent in a different way if they’re large, so the
> setting will only cover task sizes.
>
> Matei
>
> On Dec 7, 2013, at 10:20 PM, Matt Cheah  wrote:
>
>  Hi everyone,
>
>  I'm noticing like others that group-By operations with large sized
> groups gives Spark some trouble. Increasing the spark.akka.frameSize
> property alleviates it up to a point.
>
>  I was wondering what the maximum setting for this value is. I've seen
> previous e-mails talking about the ramifications of turning up this value,
> but I was wondering what the actual maximum number that could be set for it
> is. I'll benchmark the performance hit accordingly.
>
>  Thanks!
>
>  -Matt Cheah
>
>
>


-- 
--

Shangyu, Luo


Newbie questions

2013-12-08 Thread Kenneth Tran
Hi,



1.   Is Spark suited for online learning algorithms? From what I've read so 
far (mainly from this 
slide), it seems not 
but I could be wrong.



2.   This is a Scala/JVM question. How easy is it to interop with native 
code (C++)? For me, it's important to be able to use MKL, CUDA, and write 
custom C++ code to utilize SIMD instructions.

(It's hard to talk about distributed computing when we haven't optimized at the 
single machine level.)



I'm new to Spark so apologize if you find my questions too naïve :).



Thanks.

-Ken


Re: cache()ing local variables?

2013-12-08 Thread Meisam Fathi
I asked the same question from Spark community a while ago
(http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3CCAByMnGtm2s2tyqLzw%2BMdGqgNBLbfhE6-kkZ4OPY4ANfZaDSu7Q%40mail.gmail.com%3E).
This is my understanding of how Spark works but I'd like one of the
Spark maintainers confirm it.

There are only two ways to remove RDD data from cache:
1) When programmers explicitly call unpersist() or other API on an RDD.
2) When Spark's cache is full, Spark automatically spills cached data
into the disk and frees up main memory.

Spark uses LRU policy for evicting cache, which means Spark will
eventually remove cached data but it may keep them in cache longer
than needed. In your case, the cache allocated to stationsAboveHWM can
be reclaimed immediately after the getMoveReco() method returns but
stationsAboveHWM will remain in the cache.

Thanks,
Meisam

On Sun, Dec 8, 2013 at 12:05 PM, K. Shankari  wrote:
> I have some local variables in a function that are generated by a shuffle
> operation. To improve performance, I chose to cache() them, with the
> assumption that they would be automatically removed from the cache when they
> were deallocated.
>
> However, I am not sure that this is the case.
>
> I recently changed my code to change from cacheing() one local variable to
> cache()ing 2-3, and I have consistently started running out of memory.
>
> Again, the underlying dataset is ~ 2MB, and while the source RDD for the
> groupBy is fairly large, the cached variables are pretty small ~ 10 rows of
> (Int, Int) and (Int, Int, Int)
>
> Can any of the maintainers clarify whether RDDs are supposed to be removed
> from the cache when their associated local variables are deallocated?
>
> Here's my shuffle operation:
>
>   def getFinalState() = {
> def maxTs(ss1: StationStatus, ss2: StationStatus) = {
>   if(ss1.ts > ss2.ts) ss1 else ss2
> }
> peer.map(ss => (ss.id, ss)).reduceByKey(maxTs)
>   }
>
> and here's where I cache it:
>
> def getMoveReco() = {
>   val finalStateRDD = stationStatusRDD.getFinalState
>   val currEmptyStations = finalStateRDD.filter{case(stnId, ss) =>
> ss.nBikes == 0}.cache
>   val currFullStations = finalStateRDD.filter{case(stnId, ss) =>
> ss.nEmpty == 0}.cache
>   // this is of the form (id, nBikesBelowLWM, nBikesAboveHWM)
>   val stationWaterMarkDiffRDD = getWaterMarkDiffRDD(finalStateRDD).cache
>   val stationsAboveHWM = stationWaterMarkDiffRDD.filter{case(id, nblwm,
> nahwm) => nahwm > 0}.map{case(s, l, h) => (s, h)}.cache
>   val stationsBelowLWM = stationWaterMarkDiffRDD.filter{case(id, nblwm,
> nahwm) => nblwm > 0}.map{case(s, l, h) => (s, l)}.cache
>   val balancedStations = stationWaterMarkDiffRDD.filter{case(id, nblwm,
> nahwm) => nblwm <= 0 && nahwm <= 0}.cache
> ...
> }
>
> Thanks,
> Shankari


cache()ing local variables?

2013-12-08 Thread K. Shankari
I have some local variables in a function that are generated by a shuffle
operation. To improve performance, I chose to cache() them, with the
assumption that they would be automatically removed from the cache when
they were deallocated.

However, I am not sure that this is the case.

I recently changed my code to change from cacheing() one local variable to
cache()ing 2-3, and I have consistently started running out of memory.

Again, the underlying dataset is ~ 2MB, and while the source RDD for the
groupBy is fairly large, the cached variables are pretty small ~ 10 rows of
(Int, Int) and (Int, Int, Int)

Can any of the maintainers clarify whether RDDs are supposed to be removed
from the cache when their associated local variables are deallocated?

Here's my shuffle operation:

  def getFinalState() = {
def maxTs(ss1: StationStatus, ss2: StationStatus) = {
  if(ss1.ts > ss2.ts) ss1 else ss2
}
peer.map(ss => (ss.id, ss)).reduceByKey(maxTs)
  }

and here's where I cache it:

def getMoveReco() = {
  val finalStateRDD = stationStatusRDD.getFinalState
  val currEmptyStations = finalStateRDD.filter{case(stnId, ss) =>
ss.nBikes == 0}.cache
  val currFullStations = finalStateRDD.filter{case(stnId, ss) =>
ss.nEmpty == 0}.cache
  // this is of the form (id, nBikesBelowLWM, nBikesAboveHWM)
  val stationWaterMarkDiffRDD = getWaterMarkDiffRDD(finalStateRDD).cache
  val stationsAboveHWM = stationWaterMarkDiffRDD.filter{case(id, nblwm,
nahwm) => nahwm > 0}.map{case(s, l, h) => (s, h)}.cache
  val stationsBelowLWM = stationWaterMarkDiffRDD.filter{case(id, nblwm,
nahwm) => nblwm > 0}.map{case(s, l, h) => (s, l)}.cache
  val balancedStations = stationWaterMarkDiffRDD.filter{case(id, nblwm,
nahwm) => nblwm <= 0 && nahwm <= 0}.cache
...
}

Thanks,
Shankari


Re: Writing an RDD to Hive

2013-12-08 Thread Christopher Nguyen
Philip, fwiw we do go with including Shark as a dependency for our needs,
making a fat jar, and it works very well. It was quite a bit of pain what
with the Hadoop/Hive transitive dependencies, but for us it was worth it.

I hope that serves as an existence proof that says Mt Everest has been
climbed, likely by more than just ourselves. Going forward this should be
getting easier.

--
Christopher T. Nguyen
Co-founder & CEO, Adatao 
linkedin.com/in/ctnguyen



On Fri, Dec 6, 2013 at 7:06 PM, Philip Ogren wrote:

>  I have a simple scenario that I'm struggling to implement.  I would like
> to take a fairly simple RDD generated from a large log file, perform some
> transformations on it, and write the results out such that I can perform a
> Hive query either from Hive (via Hue) or Shark.  I'm having troubles with
> the last step.  I am able to write my data out to HDFS and then execute a
> Hive create table statement followed by a load data statement as a separate
> step.  I really dislike this separate manual step and would like to be able
> to have it all accomplished in my Spark application.  To this end, I have
> investigated two possible approaches as detailed below - it's probably too
> much information so I'll ask my more basic question first:
>
> Does anyone have a basic recipe/approach for loading data in an RDD to a
> Hive table from a Spark application?
>
> 1) Load it into HBase via PairRDDFunctions.saveAsHadoopDataset.  There is
> a nice detailed email on how to do this 
> here.
> I didn't get very far thought because as soon as I added an hbase
> dependency (corresponding to the version of hbase we are running) to my
> pom.xml file, I had an slf4j dependency conflict that caused my current
> application to explode.  I tried the latest released version and the slf4j
> dependency problem went away but then the deprecated class
> TableOutputFormat no longer exists.  Even if loading the data into hbase
> were trivially easy (and the detailed email suggests otherwise) I would
> then need to query HBase from Hive which seems a little clunky.
>
> 2) So, I decided that Shark might be an easier option.  All the examples
> provided in their documentation seem to assume that you are using Shark as
> an interactive application from a shell.  Various threads I've seen seem to
> indicate that Shark isn't really intended to be used as dependency in your
> Spark code (see 
> thisand
> that.)
> It follows then that one can't add a Shark dependency to a pom.xml file
> because Shark isn't released via Maven Central (that I can tell perhaps
> it's in some other repo?)  Of course, there are ways of creating a local
> dependency in maven but it starts to feel very hacky.
>
> I realize that I've given sufficient detail to expose my ignorance in a
> myriad of ways.  Please feel free to shine light on any of my
> misconceptions!
>
> Thanks,
> Philip
>
>


Re: Spark Import Issue

2013-12-08 Thread Matei Zaharia
I’m not sure you can have a star inside that quoted classpath argument (the 
double quotes may cancel the *). Try using the JAR through its full name, or 
link to Spark through Maven 
(http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-java).

Matei

On Dec 6, 2013, at 9:50 AM, Garrett Hamers  wrote:

> Hello,
> I am new to the spark system, and I am trying to write a simple program to 
> get myself familiar with how spark works. I am currently having problem with 
> importing the spark package. I am getting the following compiler error: 
> package org.apache.spark.api.java does not exist. 
> I have spark-0.8.0-incubating install. I ran the commands: sbt/sbt compile, 
> sbt/sbt assembly, and sbt/sbt publish-local without any errors. My sql.java 
> file is located in the spark-0.8.0-incubating root directory. I tried to 
> compile the code using “javac sql.java” and “javac -cp 
> "assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating*.jar" 
> sql.java”.
> 
> Here is the code for sql.java:
> package shark;
> import java.io.Serializable;
> import java.util.List;
> import java.io.*;
> import org.apache.spark.api.java.*; //Issue is here
> public class sql implements Serializable { 
>   public static void main( String[] args) {
> System.out.println("Hello World”);
>   }
> }
> 
> What do I need to do in order for java to import the spark code properly? Any 
> advice would be greatly appreciated.
> 
> Thank you,
> Garrett Hamers



Re: Biggest spark.akka.framesize possible

2013-12-08 Thread Matei Zaharia
Hey Matt,

This setting shouldn’t really affect groupBy operations, because they don’t go 
through Akka. The frame size setting is for messages from the master to workers 
(specifically, sending out tasks), and for results that go directly from 
workers to the application (e.g. collect()). So it shouldn’t be a problem 
unless these are large. In Spark 0.8.1, results back to the master will be sent 
in a different way if they’re large, so the setting will only cover task sizes.

Matei

On Dec 7, 2013, at 10:20 PM, Matt Cheah  wrote:

> Hi everyone,
> 
> I'm noticing like others that group-By operations with large sized groups 
> gives Spark some trouble. Increasing the spark.akka.frameSize property 
> alleviates it up to a point.
> 
> I was wondering what the maximum setting for this value is. I've seen 
> previous e-mails talking about the ramifications of turning up this value, 
> but I was wondering what the actual maximum number that could be set for it 
> is. I'll benchmark the performance hit accordingly.
> 
> Thanks!
> 
> -Matt Cheah