Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-08 Thread Patrick Wendell
Okay I think I've isolated this a bit more. Let's discuss over on the JIRA:

https://issues.apache.org/jira/browse/SPARK-2075

On Sun, Jun 8, 2014 at 1:16 PM, Paul Brown  wrote:
>
> Hi, Patrick --
>
> Java 7 on the development machines:
>
> » java -version
> 1 ↵
> java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>
>
> And on the deployed boxes:
>
> $ java -version
> java version "1.7.0_55"
> OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1)
> OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
>
>
> Also, "unzip -l" in place of "jar tvf" gives the same results, so I don't
> think it's an issue with jar not reporting the files.  Also, the classes do
> get correctly packaged into the uberjar:
>
> unzip -l /target/[deleted]-driver.jar | grep 'rdd/RDD' | grep 'saveAs'
>  1519  06-08-14 12:05
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>  1560  06-08-14 12:05
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
>
>
> Best.
> -- Paul
>
> —
> p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>
>
> On Sun, Jun 8, 2014 at 1:02 PM, Patrick Wendell  wrote:
>>
>> Paul,
>>
>> Could you give the version of Java that you are building with and the
>> version of Java you are running with? Are they the same?
>>
>> Just off the cuff, I wonder if this is related to:
>> https://issues.apache.org/jira/browse/SPARK-1520
>>
>> If it is, it could appear that certain functions are not in the jar
>> because they go beyond the extended zip boundary `jar tvf` won't list
>> them.
>>
>> - Patrick
>>
>> On Sun, Jun 8, 2014 at 12:45 PM, Paul Brown  wrote:
>> > Moving over to the dev list, as this isn't a user-scope issue.
>> >
>> > I just ran into this issue with the missing saveAsTestFile, and here's a
>> > little additional information:
>> >
>> > - Code ported from 0.9.1 up to 1.0.0; works with local[n] in both cases.
>> > - Driver built as an uberjar via Maven.
>> > - Deployed to smallish EC2 cluster in standalone mode (S3 storage) with
>> > Spark 1.0.0-hadoop1 downloaded from Apache.
>> >
>> > Given that it functions correctly in local mode but not in a standalone
>> > cluster, this suggests to me that the issue is in a difference between
>> > the
>> > Maven version and the hadoop1 version.
>> >
>> > In the spirit of taking the computer at its word, we can just have a
>> > look
>> > in the JAR files.  Here's what's in the Maven dep as of 1.0.0:
>> >
>> > jar tvf
>> >
>> > ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
>> > | grep 'rdd/RDD' | grep 'saveAs'
>> >   1519 Mon May 26 13:57:58 PDT 2014
>> > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>> >   1560 Mon May 26 13:57:58 PDT 2014
>> > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
>> >
>> >
>> > And here's what's in the hadoop1 distribution:
>> >
>> > jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep
>> > 'saveAs'
>> >
>> >
>> > I.e., it's not there.  It is in the hadoop2 distribution:
>> >
>> > jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep
>> > 'saveAs'
>> >   1519 Mon May 26 07:29:54 PDT 2014
>> > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>> >   1560 Mon May 26 07:29:54 PDT 2014
>> > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
>> >
>> >
>> > So something's clearly broken with the way that the distribution
>> > assemblies
>> > are created.
>> >
>> > FWIW and IMHO, the "right" way to publish the hadoop1 and hadoop2
>> > flavors
>> > of Spark to Maven Central would be as *entirely different* artifacts
>> > (spark-core-h1, spark-core-h2).
>> >
>> > Logged as SPARK-2075 .
>> >
>> > Cheers.
>> > -- Paul
>> >
>> >
>> >
>> > --
>> > p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>> >
>> >
>> > On Fri, Jun 6, 2014 at 2:45 AM, HenriV  wrote:
>> >
>> >> I'm experiencing the same error while upgrading from 0.9.1 to 1.0.0.
>> >> Im using google compute engine and cloud storage. but saveAsTextFile is
>> >> returning errors while saving in the cloud or saving local. When i
>> >> start a
>> >> job in the cluster i do get an error but after this error it keeps on
>> >> running fine untill the saveAsTextFile. ( I don't know if the two are
>> >> connected)
>> >>
>> >> ---Error at job startup---
>> >>  ERROR metrics.MetricsSystem: Sink class
>> >> org.apache.spark.metrics.sink.MetricsServlet cannot be instantialized
>> >> java.lang.reflect.InvocationTargetException
>> >> at
>> >> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> >> Method)
>> >> at
>> >>
>> >>
>> >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>> >> at
>> >>
>> >>
>> >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> >> at
>> >> 

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-08 Thread Sean Owen
I suspect Patrick is right about the cause. The Maven artifact that
was released does contain this class (phew)

http://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-core_2.10%7C1.0.0%7Cjar

As to the hadoop1 / hadoop2 artifact question -- agree that is often
done. Here the working theory seems to be to depend on the one
artifact (whose API should be identical regardless of dependencies)
and then customize the hadoop-client dep. Here, there are not two
versions deployed to Maven at all.


On Sun, Jun 8, 2014 at 4:02 PM, Patrick Wendell  wrote:
> Paul,
>
> Could you give the version of Java that you are building with and the
> version of Java you are running with? Are they the same?
>
> Just off the cuff, I wonder if this is related to:
> https://issues.apache.org/jira/browse/SPARK-1520
>
> If it is, it could appear that certain functions are not in the jar
> because they go beyond the extended zip boundary `jar tvf` won't list
> them.


Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-08 Thread Patrick Wendell
Also I should add - thanks for taking time to help narrow this down!

On Sun, Jun 8, 2014 at 1:02 PM, Patrick Wendell  wrote:
> Paul,
>
> Could you give the version of Java that you are building with and the
> version of Java you are running with? Are they the same?
>
> Just off the cuff, I wonder if this is related to:
> https://issues.apache.org/jira/browse/SPARK-1520
>
> If it is, it could appear that certain functions are not in the jar
> because they go beyond the extended zip boundary `jar tvf` won't list
> them.
>
> - Patrick
>
> On Sun, Jun 8, 2014 at 12:45 PM, Paul Brown  wrote:
>> Moving over to the dev list, as this isn't a user-scope issue.
>>
>> I just ran into this issue with the missing saveAsTestFile, and here's a
>> little additional information:
>>
>> - Code ported from 0.9.1 up to 1.0.0; works with local[n] in both cases.
>> - Driver built as an uberjar via Maven.
>> - Deployed to smallish EC2 cluster in standalone mode (S3 storage) with
>> Spark 1.0.0-hadoop1 downloaded from Apache.
>>
>> Given that it functions correctly in local mode but not in a standalone
>> cluster, this suggests to me that the issue is in a difference between the
>> Maven version and the hadoop1 version.
>>
>> In the spirit of taking the computer at its word, we can just have a look
>> in the JAR files.  Here's what's in the Maven dep as of 1.0.0:
>>
>> jar tvf
>> ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
>> | grep 'rdd/RDD' | grep 'saveAs'
>>   1519 Mon May 26 13:57:58 PDT 2014
>> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>>   1560 Mon May 26 13:57:58 PDT 2014
>> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
>>
>>
>> And here's what's in the hadoop1 distribution:
>>
>> jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
>>
>>
>> I.e., it's not there.  It is in the hadoop2 distribution:
>>
>> jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
>>   1519 Mon May 26 07:29:54 PDT 2014
>> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>>   1560 Mon May 26 07:29:54 PDT 2014
>> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
>>
>>
>> So something's clearly broken with the way that the distribution assemblies
>> are created.
>>
>> FWIW and IMHO, the "right" way to publish the hadoop1 and hadoop2 flavors
>> of Spark to Maven Central would be as *entirely different* artifacts
>> (spark-core-h1, spark-core-h2).
>>
>> Logged as SPARK-2075 .
>>
>> Cheers.
>> -- Paul
>>
>>
>>
>> --
>> p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>>
>>
>> On Fri, Jun 6, 2014 at 2:45 AM, HenriV  wrote:
>>
>>> I'm experiencing the same error while upgrading from 0.9.1 to 1.0.0.
>>> Im using google compute engine and cloud storage. but saveAsTextFile is
>>> returning errors while saving in the cloud or saving local. When i start a
>>> job in the cluster i do get an error but after this error it keeps on
>>> running fine untill the saveAsTextFile. ( I don't know if the two are
>>> connected)
>>>
>>> ---Error at job startup---
>>>  ERROR metrics.MetricsSystem: Sink class
>>> org.apache.spark.metrics.sink.MetricsServlet cannot be instantialized
>>> java.lang.reflect.InvocationTargetException
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>> at
>>>
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> at
>>>
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> at
>>>
>>> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:136)
>>> at
>>>
>>> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:130)
>>> at
>>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>>> at
>>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>>> at
>>> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>>> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>>> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>>> at
>>>
>>> org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:130)
>>> at
>>> org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:84)
>>> at
>>>
>>> org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:167)
>>> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:230)
>>> at org.apache.spark.SparkContext.(SparkContext.scala:202)
>>> at Hello$.main(Hello.scala:101)
>>> at Hello.main(Hello.scala)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-08 Thread Patrick Wendell
Paul,

Could you give the version of Java that you are building with and the
version of Java you are running with? Are they the same?

Just off the cuff, I wonder if this is related to:
https://issues.apache.org/jira/browse/SPARK-1520

If it is, it could appear that certain functions are not in the jar
because they go beyond the extended zip boundary `jar tvf` won't list
them.

- Patrick

On Sun, Jun 8, 2014 at 12:45 PM, Paul Brown  wrote:
> Moving over to the dev list, as this isn't a user-scope issue.
>
> I just ran into this issue with the missing saveAsTestFile, and here's a
> little additional information:
>
> - Code ported from 0.9.1 up to 1.0.0; works with local[n] in both cases.
> - Driver built as an uberjar via Maven.
> - Deployed to smallish EC2 cluster in standalone mode (S3 storage) with
> Spark 1.0.0-hadoop1 downloaded from Apache.
>
> Given that it functions correctly in local mode but not in a standalone
> cluster, this suggests to me that the issue is in a difference between the
> Maven version and the hadoop1 version.
>
> In the spirit of taking the computer at its word, we can just have a look
> in the JAR files.  Here's what's in the Maven dep as of 1.0.0:
>
> jar tvf
> ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
> | grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 13:57:58 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 13:57:58 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
>
>
> And here's what's in the hadoop1 distribution:
>
> jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
>
>
> I.e., it's not there.  It is in the hadoop2 distribution:
>
> jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 07:29:54 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 07:29:54 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
>
>
> So something's clearly broken with the way that the distribution assemblies
> are created.
>
> FWIW and IMHO, the "right" way to publish the hadoop1 and hadoop2 flavors
> of Spark to Maven Central would be as *entirely different* artifacts
> (spark-core-h1, spark-core-h2).
>
> Logged as SPARK-2075 .
>
> Cheers.
> -- Paul
>
>
>
> --
> p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>
>
> On Fri, Jun 6, 2014 at 2:45 AM, HenriV  wrote:
>
>> I'm experiencing the same error while upgrading from 0.9.1 to 1.0.0.
>> Im using google compute engine and cloud storage. but saveAsTextFile is
>> returning errors while saving in the cloud or saving local. When i start a
>> job in the cluster i do get an error but after this error it keeps on
>> running fine untill the saveAsTextFile. ( I don't know if the two are
>> connected)
>>
>> ---Error at job startup---
>>  ERROR metrics.MetricsSystem: Sink class
>> org.apache.spark.metrics.sink.MetricsServlet cannot be instantialized
>> java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>> at
>>
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>> at
>>
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>> at
>>
>> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:136)
>> at
>>
>> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:130)
>> at
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>> at
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>> at
>> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>> at
>>
>> org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:130)
>> at
>> org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:84)
>> at
>>
>> org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:167)
>> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:230)
>> at org.apache.spark.SparkContext.(SparkContext.scala:202)
>> at Hello$.main(Hello.scala:101)
>> at Hello.main(Hello.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-08 Thread Paul Brown
Moving over to the dev list, as this isn't a user-scope issue.

I just ran into this issue with the missing saveAsTestFile, and here's a
little additional information:

- Code ported from 0.9.1 up to 1.0.0; works with local[n] in both cases.
- Driver built as an uberjar via Maven.
- Deployed to smallish EC2 cluster in standalone mode (S3 storage) with
Spark 1.0.0-hadoop1 downloaded from Apache.

Given that it functions correctly in local mode but not in a standalone
cluster, this suggests to me that the issue is in a difference between the
Maven version and the hadoop1 version.

In the spirit of taking the computer at its word, we can just have a look
in the JAR files.  Here's what's in the Maven dep as of 1.0.0:

jar tvf
~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
| grep 'rdd/RDD' | grep 'saveAs'
  1519 Mon May 26 13:57:58 PDT 2014
org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
  1560 Mon May 26 13:57:58 PDT 2014
org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class


And here's what's in the hadoop1 distribution:

jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'


I.e., it's not there.  It is in the hadoop2 distribution:

jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
  1519 Mon May 26 07:29:54 PDT 2014
org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
  1560 Mon May 26 07:29:54 PDT 2014
org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class


So something's clearly broken with the way that the distribution assemblies
are created.

FWIW and IMHO, the "right" way to publish the hadoop1 and hadoop2 flavors
of Spark to Maven Central would be as *entirely different* artifacts
(spark-core-h1, spark-core-h2).

Logged as SPARK-2075 .

Cheers.
-- Paul



—
p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


On Fri, Jun 6, 2014 at 2:45 AM, HenriV  wrote:

> I'm experiencing the same error while upgrading from 0.9.1 to 1.0.0.
> Im using google compute engine and cloud storage. but saveAsTextFile is
> returning errors while saving in the cloud or saving local. When i start a
> job in the cluster i do get an error but after this error it keeps on
> running fine untill the saveAsTextFile. ( I don't know if the two are
> connected)
>
> ---Error at job startup---
>  ERROR metrics.MetricsSystem: Sink class
> org.apache.spark.metrics.sink.MetricsServlet cannot be instantialized
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
>
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
>
> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:136)
> at
>
> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:130)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
> at
>
> org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:130)
> at
> org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:84)
> at
>
> org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:167)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:230)
> at org.apache.spark.SparkContext.(SparkContext.scala:202)
> at Hello$.main(Hello.scala:101)
> at Hello.main(Hello.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sbt.Run.invokeMain(Run.scala:72)
> at sbt.Run.run0(Run.scala:65)
> at sbt.Run.sbt$Run$$execute$1(Run.scala:54)
> at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:58)
> at sbt.Run$$anonfun$run$1.apply(Run.scala:58)
> at sbt.Run$$anonfun$run$1.apply(Run.scala:58)
> at sbt.Logger$$anon$4.apply(Logger.scala:90)
> at sbt.TrapExit$App.run(TrapExit.scala:244)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NoSuchMethodError:
> com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
> at

MIMA Compatiblity Checks

2014-06-08 Thread Patrick Wendell
Hey All,

Some people may have noticed PR failures due to binary compatibility
checks. We've had these enabled in several of the sub-modules since
the 0.9.0 release but we've turned them on in Spark core post 1.0.0
which has much higher churn.

The checks are based on the "migration manager" tool from Typesafe.
One issue is that tool doesn't support package-private declarations of
classes or methods. Prashant Sharma has built instrumentation that
adds partial support for package-privacy (via a workaround) but since
there isn't really native support for this in MIMA we are still
finding cases in which we trigger false positives.

In the next week or two we'll make it a priority to handle more of
these false-positive cases. In the mean time users can add manual
excludes to:

project/MimaExcludes.scala

to avoid triggering warnings for certain issues.

This is definitely annoying - sorry about that. Unfortunately we are
the first open source Scala project to ever do this, so we are dealing
with uncharted territory.

Longer term I'd actually like to see us just write our own sbt-based
tool to do this in a better way (we've had trouble trying to extend
MIMA itself, it e.g. has copy-pasted code in it from an old version of
the scala compiler). If someone in the community is a Scala fan and
wants to take that on, I'm happy to give more details.

- Patrick


Apache Spark and Swift object store

2014-06-08 Thread Gil Vernik
Hello everyone,

I would like to initiate discussion about integration Apache Spark and 
Openstack Swift. 
(https://issues.apache.org/jira/browse/SPARK-938 was created while ago)

I created a patch (https://github.com/apache/spark/pull/1010) that 
provides initial information how to connect Swift and Spark. Currently it 
uses Hadoop 2.3.0 and only stand alone mode of Spark. This patch is mainly 
used to provide community a way to experiment with this integration.
I have it fully working on my private cluster and it works very well, 
allowing me to make various analytics using Spark.

My next planned patches will include information how to configure Swift 
for other cluster deployment of Spark and also information how to 
integrate Spark and Swift with earlier versions of Hadoop. 
I am confident that the integration between Spark and Swift is very 
important future that will  benefit greatly for the exposure of Spark.

The integration between Spark and Swift is very similar to how Spark 
integrates with S3.

Will be great to hear comments / suggestions / remarks from the community!

All the best,
Gil Vernik.