Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sree V
Sorry, I couldn't catch up before closing the voting.If it still counts, mvn 
package fails (1).  And didn't run test (2).  So, -1.1.mvn -Phadoop-2.4 -Pyarn 
-Phive -Phive-0.13.1 -Dhadoop.version=2.6.0 -DskipTests clean package
2. mvn -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.6.0 test
Error:
[INFO] Spark Project External Flume Sink .. SUCCESS [ 39.561 s]
[INFO] Spark Project External Flume ... FAILURE [ 11.212 s]
[INFO] Spark Project External MQTT  SKIPPED
[INFO] Spark Project External ZeroMQ .. SKIPPED
[INFO] Spark Project External Kafka ... SKIPPED
[INFO] Spark Project Examples . SKIPPED
[INFO] Spark Project YARN Shuffle Service . SKIPPED


Thanking you.

With Regards
Sree 


 On Thursday, April 16, 2015 3:42 PM, Patrick Wendell pwend...@gmail.com 
wrote:
   

 I'm gonna go ahead and close this now - thanks everyone for voting!

This vote passes with 7 +1 votes (6 binding) and no 0 or -1 votes.

+1:
Mark Hamstra*
Reynold Xin
Kirshna Sankar
Sean Owen*
Tom Graves*
Joseph Bradley*
Sean McNamara*

0:

-1:

Thanks!
- Patrick

On Thu, Apr 16, 2015 at 3:27 PM, Sean Owen so...@cloudera.com wrote:
 No, of course Jenkins runs tests. The way some of the tests work, they
 need the build artifacts to have been created first. So it runs mvn
 ... -DskipTests package then mvn ... test

 On Thu, Apr 16, 2015 at 11:09 PM, Sree V sree_at_ch...@yahoo.com wrote:
 In my effort to vote for this release, I found these along:

 This is from jenkins.  It uses -DskipTests.

 [centos] $
 /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.0.5/bin/mvn
 -Dhadoop.version=2.0.0-mr1-cdh4.1.2 -Dlabel=centos -DskipTests clean package

 We build on our locals / servers using same flag.


 Usually, for releases we build with running all the tests, as well. and at
 some level of code coverage.

 Are we by-passing it ?



 Thanking you.

 With Regards
 Sree



 On Wednesday, April 15, 2015 3:32 PM, Sean McNamara
 sean.mcnam...@webtrends.com wrote:


 Ran tests on OS X

 +1

 Sean


 On Apr 14, 2015, at 10:59 PM, Patrick Wendell pwend...@gmail.com wrote:

 I'd like to close this vote to coincide with the 1.3.1 release,
 however, it would be great to have more people test this release
 first. I'll leave it open for a bit longer and see if others can give
 a +1.

 On Tue, Apr 14, 2015 at 9:55 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 +1 from me ass well.

 On Tue, Apr 7, 2015 at 4:36 AM, Sean Owen so...@cloudera.com wrote:
 I think that's close enough for a +1:

 Signatures and hashes are good.
 LICENSE, NOTICE still check out.
 Compiles for a Hadoop 2.6 + YARN + Hive profile.

 JIRAs with target version = 1.2.x look legitimate; no blockers.

 I still observe several Hive test failures with:
 mvn -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.6.0
 -DskipTests clean package; mvn -Phadoop-2.4 -Pyarn -Phive
 -Phive-0.13.1 -Dhadoop.version=2.6.0 test
 .. though again I think these are not regressions but known issues in
 older branches.

 FYI there are 16 Critical issues still open for 1.2.x:

 SPARK-6209,ExecutorClassLoader can leak connections after failing to
 load classes from the REPL class server,Josh Rosen,In Progress,4/5/15
 SPARK-5098,Number of running tasks become negative after tasks
 lost,,Open,1/14/15
 SPARK-4888,Spark EC2 doesn't mount local disks for i2.8xlarge
 instances,,Open,1/27/15
 SPARK-4879,Missing output partitions after job completes with
 speculative execution,Josh Rosen,Open,3/5/15
 SPARK-4568,Publish release candidates under $VERSION-RCX instead of
 $VERSION,Patrick Wendell,Open,11/24/14
 SPARK-4520,SparkSQL exception when reading certain columns from a
 parquet file,sadhan sood,Open,1/21/15
 SPARK-4514,SparkContext localProperties does not inherit property
 updates across thread reuse,Josh Rosen,Open,3/31/15
 SPARK-4454,Race condition in DAGScheduler,Josh Rosen,Reopened,2/18/15
 SPARK-4452,Shuffle data structures can starve others on the same
 thread for memory,Tianshuo Deng,Open,1/24/15
 SPARK-4356,Test Scala 2.11 on Jenkins,Patrick Wendell,Open,11/12/14
 SPARK-4258,NPE with new Parquet Filters,Cheng Lian,Reopened,4/3/15
 SPARK-4194,Exceptions thrown during SparkContext or SparkEnv
 construction might lead to resource leaks or corrupted global
 state,,In Progress,4/2/15
 SPARK-4159,Maven build doesn't run JUnit test suites,Sean
 Owen,Open,1/11/15
 SPARK-4106,Shuffle write and spill to disk metrics are
 incorrect,,Open,10/28/14
 SPARK-3492,Clean up Yarn integration code,Andrew Or,Open,9/12/14
 SPARK-3461,Support external groupByKey using
 repartitionAndSortWithinPartitions,Sandy Ryza,Open,11/10/14
 SPARK-2984,FileNotFoundException on _temporary directory,,Open,12/11/14
 SPARK-2532,Fix issues with consolidated shuffle,,Open,3/26/15
 SPARK-1312,Batch should read based on the batch interval provided in
 the 

Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sean Owen
Sree that doesn't show any error, so it doesn't help. I built with the
same flags when I tested and it succeeded.

On Fri, Apr 17, 2015 at 8:53 AM, Sree V sree_at_ch...@yahoo.com.invalid wrote:
 Sorry, I couldn't catch up before closing the voting.If it still counts, mvn 
 package fails (1).  And didn't run test (2).  So, -1.1.mvn -Phadoop-2.4 
 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.6.0 -DskipTests clean package
 2. mvn -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.6.0 test
 Error:
 [INFO] Spark Project External Flume Sink .. SUCCESS [ 39.561 
 s]
 [INFO] Spark Project External Flume ... FAILURE [ 11.212 
 s]
 [INFO] Spark Project External MQTT  SKIPPED
 [INFO] Spark Project External ZeroMQ .. SKIPPED
 [INFO] Spark Project External Kafka ... SKIPPED
 [INFO] Spark Project Examples . SKIPPED
 [INFO] Spark Project YARN Shuffle Service . SKIPPED


 Thanking you.

 With Regards
 Sree


  On Thursday, April 16, 2015 3:42 PM, Patrick Wendell 
 pwend...@gmail.com wrote:


  I'm gonna go ahead and close this now - thanks everyone for voting!

 This vote passes with 7 +1 votes (6 binding) and no 0 or -1 votes.

 +1:
 Mark Hamstra*
 Reynold Xin
 Kirshna Sankar
 Sean Owen*
 Tom Graves*
 Joseph Bradley*
 Sean McNamara*

 0:

 -1:

 Thanks!
 - Patrick

 On Thu, Apr 16, 2015 at 3:27 PM, Sean Owen so...@cloudera.com wrote:
 No, of course Jenkins runs tests. The way some of the tests work, they
 need the build artifacts to have been created first. So it runs mvn
 ... -DskipTests package then mvn ... test

 On Thu, Apr 16, 2015 at 11:09 PM, Sree V sree_at_ch...@yahoo.com wrote:
 In my effort to vote for this release, I found these along:

 This is from jenkins.  It uses -DskipTests.

 [centos] $
 /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.0.5/bin/mvn
 -Dhadoop.version=2.0.0-mr1-cdh4.1.2 -Dlabel=centos -DskipTests clean package

 We build on our locals / servers using same flag.


 Usually, for releases we build with running all the tests, as well. and at
 some level of code coverage.

 Are we by-passing it ?



 Thanking you.

 With Regards
 Sree



 On Wednesday, April 15, 2015 3:32 PM, Sean McNamara
 sean.mcnam...@webtrends.com wrote:


 Ran tests on OS X

 +1

 Sean


 On Apr 14, 2015, at 10:59 PM, Patrick Wendell pwend...@gmail.com wrote:

 I'd like to close this vote to coincide with the 1.3.1 release,
 however, it would be great to have more people test this release
 first. I'll leave it open for a bit longer and see if others can give
 a +1.

 On Tue, Apr 14, 2015 at 9:55 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 +1 from me ass well.

 On Tue, Apr 7, 2015 at 4:36 AM, Sean Owen so...@cloudera.com wrote:
 I think that's close enough for a +1:

 Signatures and hashes are good.
 LICENSE, NOTICE still check out.
 Compiles for a Hadoop 2.6 + YARN + Hive profile.

 JIRAs with target version = 1.2.x look legitimate; no blockers.

 I still observe several Hive test failures with:
 mvn -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.6.0
 -DskipTests clean package; mvn -Phadoop-2.4 -Pyarn -Phive
 -Phive-0.13.1 -Dhadoop.version=2.6.0 test
 .. though again I think these are not regressions but known issues in
 older branches.

 FYI there are 16 Critical issues still open for 1.2.x:

 SPARK-6209,ExecutorClassLoader can leak connections after failing to
 load classes from the REPL class server,Josh Rosen,In Progress,4/5/15
 SPARK-5098,Number of running tasks become negative after tasks
 lost,,Open,1/14/15
 SPARK-4888,Spark EC2 doesn't mount local disks for i2.8xlarge
 instances,,Open,1/27/15
 SPARK-4879,Missing output partitions after job completes with
 speculative execution,Josh Rosen,Open,3/5/15
 SPARK-4568,Publish release candidates under $VERSION-RCX instead of
 $VERSION,Patrick Wendell,Open,11/24/14
 SPARK-4520,SparkSQL exception when reading certain columns from a
 parquet file,sadhan sood,Open,1/21/15
 SPARK-4514,SparkContext localProperties does not inherit property
 updates across thread reuse,Josh Rosen,Open,3/31/15
 SPARK-4454,Race condition in DAGScheduler,Josh Rosen,Reopened,2/18/15
 SPARK-4452,Shuffle data structures can starve others on the same
 thread for memory,Tianshuo Deng,Open,1/24/15
 SPARK-4356,Test Scala 2.11 on Jenkins,Patrick Wendell,Open,11/12/14
 SPARK-4258,NPE with new Parquet Filters,Cheng Lian,Reopened,4/3/15
 SPARK-4194,Exceptions thrown during SparkContext or SparkEnv
 construction might lead to resource leaks or corrupted global
 state,,In Progress,4/2/15
 SPARK-4159,Maven build doesn't run JUnit test suites,Sean
 Owen,Open,1/11/15
 SPARK-4106,Shuffle write and spill to disk metrics are
 incorrect,,Open,10/28/14
 SPARK-3492,Clean up Yarn integration code,Andrew Or,Open,9/12/14
 SPARK-3461,Support external groupByKey using
 repartitionAndSortWithinPartitions,Sandy Ryza,Open,11/10/14
 

Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

2015-04-17 Thread zhangxiongfei
Hi,
I did some tests on Parquet Files with Spark SQL DataFrame API.
I generated 36 gzip compressed parquet files by Spark SQL and stored them on 
Tachyon,The size of each file is about  222M.Then read them with below code.
val tfs 
=sqlContext.parquetFile(tachyon://datanode8.bitauto.dmp:19998/apps/tachyon/adClick);
Next,I just save this DataFrame onto HDFS with below code.It will generate 36 
parquet files too,but the size of each file is about 265M
tfs.repartition(36).saveAsParquetFile(/user/zhangxf/adClick-parquet-tachyon);
My question is Why the files on HDFS has different size with those on Tachyon 
even though they come from the same original data?


Thanks
Zhang Xiongfei



Re: Gitter chat room for Spark

2015-04-17 Thread Sean Owen
There are N chat options out there, and of course there's no need or
way to stop people from using them. If 1 is blessed as 'best', it
excludes others who prefer a different one. Tomorrow there will be a
New Best Chat App. If a bunch are blessed, the conversation fractures.

There's also a principle that important-ish discussions should take
place on official project discussion forums, i.e., the mailing lists.
Chat is just chat but sometimes discussions appropriate to the list
happen there.

For this reason I've always thought it best to punt on official-ish
chat and let it happen organically as it will, with a request to port
any important discussions to the list.


On Fri, Apr 17, 2015 at 7:41 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
 Freenode already has a bit active channel under #Apache-spark, I think Josh
 idle there sometimes.

 Thanks
 Best Regards

 On Fri, Apr 17, 2015 at 3:33 AM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 Would we be interested in having a public chat room?

 Gitter http://gitter.im offers them for free for open source projects.
 It's like web-based IRC.

 Check out the Docker room for example:

 https://gitter.im/docker/docker

 And if people prefer to use actual IRC, Gitter offers a bridge for that
 https://irc.gitter.im/ to their service.

 All we need is someone who's a member of the Apache GitHub group to create
 a room for us.

 It should show up under

 https://gitter.im/apache/spark

 when it's ready.

 What do y'all think?

 Nick


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Addition of new Metrics for killed executors.

2015-04-17 Thread Archit Thakur
Hi,

We are planning to add new Metrics in Spark for the executors that got
killed during the execution. Was just curious, why this info is not already
present. Is there some reason for not adding it.?
Any ideas around are welcome.

Thanks and Regards,
Archit Thakur.


[Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Hi everyone,
I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
reproduce it in a small test case close to the actual documentation
https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection,
so sorry for the long mail, but this is Java :

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

class Movie implements Serializable {
private int id;
private String name;

public Movie(int id, String name) {
this.id = id;
this.name = name;
}

public int getId() {
return id;
}

public void setId(int id) {
this.id = id;
}

public String getName() {
return name;
}

public void setName(String name) {
this.name = name;
}
}

public class SparkSQLTest {
public static void main(String[] args) {
SparkConf conf = new SparkConf();
conf.setAppName(My Application);
conf.setMaster(local);
JavaSparkContext sc = new JavaSparkContext(conf);

ArrayListMovie movieArrayList = new ArrayListMovie();
movieArrayList.add(new Movie(1, Indiana Jones));

JavaRDDMovie movies = sc.parallelize(movieArrayList);

SQLContext sqlContext = new SQLContext(sc);
DataFrame frame = sqlContext.applySchema(movies, Movie.class);
frame.registerTempTable(movies);

sqlContext.sql(select name from movies)

*.map(row - row.getString(0)) // this is what i would
expect to work *.collect();
}
}


But this does not compile, here's the compilation error :

[ERROR]
/Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
method map in class org.apache.spark.sql.DataFrame cannot be applied to
given types;
[ERROR] *required:
scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR *
[ERROR]* found: (row)-Na[...]ng(0) *
[ERROR] *reason: cannot infer type-variable(s) R *
[ERROR] *(actual and formal argument lists differ in length) *
[ERROR]
/Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
method map in class org.apache.spark.sql.DataFrame cannot be applied to
given types;
[ERROR] required:
scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR
[ERROR] found: (row)-row[...]ng(0)
[ERROR] reason: cannot infer type-variable(s) R
[ERROR] (actual and formal argument lists differ in length)
[ERROR] - [Help 1]

Because in the DataFrame the *map *method is defined as :

[image: Images intégrées 1]

And once this is translated to bytecode the actual Java signature uses a
Function1 and adds a ClassTag parameter.
I can try to go around this and use the scala.reflect.ClassTag$ like that :

ClassTag$.MODULE$.apply(String.class)

To get the second ClassTag parameter right, but then instantiating a
java.util.Function or using the Java 8 lambdas fail to work, and if I
try to instantiate a proper scala Function1... well this is a world of
pain.

This is a regression introduced by the 1.3.x DataFrame because
JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are
not callable with JFunctions), I can open a Jira if you want ?

Regards,

-- 
*Olivier Girardot* | Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94


Fwd: Addition of new Metrics for killed executors.

2015-04-17 Thread Archit Thakur
-- Forwarded message --
From: Archit Thakur archit279tha...@gmail.com
Date: Fri, Apr 17, 2015 at 4:07 PM
Subject: Addition of new Metrics for killed executors.
To: u...@spark.incubator.apache.org, u...@spark.apache.org,
d...@spark.incubator.apache.org


Hi,

We are planning to add new Metrics in Spark for the executors that got
killed during the execution. Was just curious, why this info is not already
present. Is there some reason for not adding it.?
Any ideas around are welcome.

Thanks and Regards,
Archit Thakur.


Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Yes thanks !

Le ven. 17 avr. 2015 à 16:20, Ted Yu yuzhih...@gmail.com a écrit :

 The image didn't go through.

 I think you were referring to:
   override def map[R: ClassTag](f: Row = R): RDD[R] = rdd.map(f)

 Cheers

 On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

  Hi everyone,
  I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
  reproduce it in a small test case close to the actual documentation
  
 https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
 ,
  so sorry for the long mail, but this is Java :
 
  import org.apache.spark.api.java.JavaRDD;
  import org.apache.spark.api.java.JavaSparkContext;
  import org.apache.spark.sql.DataFrame;
  import org.apache.spark.sql.SQLContext;
 
  import java.io.Serializable;
  import java.util.ArrayList;
  import java.util.Arrays;
  import java.util.List;
 
  class Movie implements Serializable {
  private int id;
  private String name;
 
  public Movie(int id, String name) {
  this.id = id;
  this.name = name;
  }
 
  public int getId() {
  return id;
  }
 
  public void setId(int id) {
  this.id = id;
  }
 
  public String getName() {
  return name;
  }
 
  public void setName(String name) {
  this.name = name;
  }
  }
 
  public class SparkSQLTest {
  public static void main(String[] args) {
  SparkConf conf = new SparkConf();
  conf.setAppName(My Application);
  conf.setMaster(local);
  JavaSparkContext sc = new JavaSparkContext(conf);
 
  ArrayListMovie movieArrayList = new ArrayListMovie();
  movieArrayList.add(new Movie(1, Indiana Jones));
 
  JavaRDDMovie movies = sc.parallelize(movieArrayList);
 
  SQLContext sqlContext = new SQLContext(sc);
  DataFrame frame = sqlContext.applySchema(movies, Movie.class);
  frame.registerTempTable(movies);
 
  sqlContext.sql(select name from movies)
 
  *.map(row - row.getString(0)) // this is what i would
 expect to work *.collect();
  }
  }
 
 
  But this does not compile, here's the compilation error :
 
  [ERROR]
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
  method map in class org.apache.spark.sql.DataFrame cannot be applied to
  given types;
  [ERROR] *required:
  scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR *
  [ERROR]* found: (row)-Na[...]ng(0) *
  [ERROR] *reason: cannot infer type-variable(s) R *
  [ERROR] *(actual and formal argument lists differ in length) *
  [ERROR]
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
  method map in class org.apache.spark.sql.DataFrame cannot be applied to
  given types;
  [ERROR] required:
  scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR
  [ERROR] found: (row)-row[...]ng(0)
  [ERROR] reason: cannot infer type-variable(s) R
  [ERROR] (actual and formal argument lists differ in length)
  [ERROR] - [Help 1]
 
  Because in the DataFrame the *map *method is defined as :
 
  [image: Images intégrées 1]
 
  And once this is translated to bytecode the actual Java signature uses a
  Function1 and adds a ClassTag parameter.
  I can try to go around this and use the scala.reflect.ClassTag$ like
 that :
 
  ClassTag$.MODULE$.apply(String.class)
 
  To get the second ClassTag parameter right, but then instantiating a
 java.util.Function or using the Java 8 lambdas fail to work, and if I try
 to instantiate a proper scala Function1... well this is a world of pain.
 
  This is a regression introduced by the 1.3.x DataFrame because
 JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are not
 callable with JFunctions), I can open a Jira if you want ?
 
  Regards,
 
  --
  *Olivier Girardot* | Associé
  o.girar...@lateral-thoughts.com
  +33 6 24 09 17 94
 



Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Ted Yu
The image didn't go through.

I think you were referring to:
  override def map[R: ClassTag](f: Row = R): RDD[R] = rdd.map(f)

Cheers

On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot 
o.girar...@lateral-thoughts.com wrote:

 Hi everyone,
 I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
 reproduce it in a small test case close to the actual documentation
 https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection,
 so sorry for the long mail, but this is Java :

 import org.apache.spark.api.java.JavaRDD;
 import org.apache.spark.api.java.JavaSparkContext;
 import org.apache.spark.sql.DataFrame;
 import org.apache.spark.sql.SQLContext;

 import java.io.Serializable;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;

 class Movie implements Serializable {
 private int id;
 private String name;

 public Movie(int id, String name) {
 this.id = id;
 this.name = name;
 }

 public int getId() {
 return id;
 }

 public void setId(int id) {
 this.id = id;
 }

 public String getName() {
 return name;
 }

 public void setName(String name) {
 this.name = name;
 }
 }

 public class SparkSQLTest {
 public static void main(String[] args) {
 SparkConf conf = new SparkConf();
 conf.setAppName(My Application);
 conf.setMaster(local);
 JavaSparkContext sc = new JavaSparkContext(conf);

 ArrayListMovie movieArrayList = new ArrayListMovie();
 movieArrayList.add(new Movie(1, Indiana Jones));

 JavaRDDMovie movies = sc.parallelize(movieArrayList);

 SQLContext sqlContext = new SQLContext(sc);
 DataFrame frame = sqlContext.applySchema(movies, Movie.class);
 frame.registerTempTable(movies);

 sqlContext.sql(select name from movies)

 *.map(row - row.getString(0)) // this is what i would expect 
 to work *.collect();
 }
 }


 But this does not compile, here's the compilation error :

 [ERROR]
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
 method map in class org.apache.spark.sql.DataFrame cannot be applied to
 given types;
 [ERROR] *required:
 scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR *
 [ERROR]* found: (row)-Na[...]ng(0) *
 [ERROR] *reason: cannot infer type-variable(s) R *
 [ERROR] *(actual and formal argument lists differ in length) *
 [ERROR]
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
 method map in class org.apache.spark.sql.DataFrame cannot be applied to
 given types;
 [ERROR] required:
 scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR
 [ERROR] found: (row)-row[...]ng(0)
 [ERROR] reason: cannot infer type-variable(s) R
 [ERROR] (actual and formal argument lists differ in length)
 [ERROR] - [Help 1]

 Because in the DataFrame the *map *method is defined as :

 [image: Images intégrées 1]

 And once this is translated to bytecode the actual Java signature uses a
 Function1 and adds a ClassTag parameter.
 I can try to go around this and use the scala.reflect.ClassTag$ like that :

 ClassTag$.MODULE$.apply(String.class)

 To get the second ClassTag parameter right, but then instantiating a 
 java.util.Function or using the Java 8 lambdas fail to work, and if I try to 
 instantiate a proper scala Function1... well this is a world of pain.

 This is a regression introduced by the 1.3.x DataFrame because JavaSchemaRDD 
 used to be JavaRDDLike but DataFrame's are not (and are not callable with 
 JFunctions), I can open a Jira if you want ?

 Regards,

 --
 *Olivier Girardot* | Associé
 o.girar...@lateral-thoughts.com
 +33 6 24 09 17 94



Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Reynold Xin
I think in 1.3 and above, you'd need to do

.sql(...).javaRDD().map(..)

On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot 
o.girar...@lateral-thoughts.com wrote:

 Yes thanks !

 Le ven. 17 avr. 2015 à 16:20, Ted Yu yuzhih...@gmail.com a écrit :

  The image didn't go through.
 
  I think you were referring to:
override def map[R: ClassTag](f: Row = R): RDD[R] = rdd.map(f)
 
  Cheers
 
  On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot 
  o.girar...@lateral-thoughts.com wrote:
 
   Hi everyone,
   I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
   reproduce it in a small test case close to the actual documentation
   
 
 https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
  ,
   so sorry for the long mail, but this is Java :
  
   import org.apache.spark.api.java.JavaRDD;
   import org.apache.spark.api.java.JavaSparkContext;
   import org.apache.spark.sql.DataFrame;
   import org.apache.spark.sql.SQLContext;
  
   import java.io.Serializable;
   import java.util.ArrayList;
   import java.util.Arrays;
   import java.util.List;
  
   class Movie implements Serializable {
   private int id;
   private String name;
  
   public Movie(int id, String name) {
   this.id = id;
   this.name = name;
   }
  
   public int getId() {
   return id;
   }
  
   public void setId(int id) {
   this.id = id;
   }
  
   public String getName() {
   return name;
   }
  
   public void setName(String name) {
   this.name = name;
   }
   }
  
   public class SparkSQLTest {
   public static void main(String[] args) {
   SparkConf conf = new SparkConf();
   conf.setAppName(My Application);
   conf.setMaster(local);
   JavaSparkContext sc = new JavaSparkContext(conf);
  
   ArrayListMovie movieArrayList = new ArrayListMovie();
   movieArrayList.add(new Movie(1, Indiana Jones));
  
   JavaRDDMovie movies = sc.parallelize(movieArrayList);
  
   SQLContext sqlContext = new SQLContext(sc);
   DataFrame frame = sqlContext.applySchema(movies, Movie.class);
   frame.registerTempTable(movies);
  
   sqlContext.sql(select name from movies)
  
   *.map(row - row.getString(0)) // this is what i would
  expect to work *.collect();
   }
   }
  
  
   But this does not compile, here's the compilation error :
  
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
   method map in class org.apache.spark.sql.DataFrame cannot be applied to
   given types;
   [ERROR] *required:
   scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR *
   [ERROR]* found: (row)-Na[...]ng(0) *
   [ERROR] *reason: cannot infer type-variable(s) R *
   [ERROR] *(actual and formal argument lists differ in length) *
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
   method map in class org.apache.spark.sql.DataFrame cannot be applied to
   given types;
   [ERROR] required:
   scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR
   [ERROR] found: (row)-row[...]ng(0)
   [ERROR] reason: cannot infer type-variable(s) R
   [ERROR] (actual and formal argument lists differ in length)
   [ERROR] - [Help 1]
  
   Because in the DataFrame the *map *method is defined as :
  
   [image: Images intégrées 1]
  
   And once this is translated to bytecode the actual Java signature uses
 a
   Function1 and adds a ClassTag parameter.
   I can try to go around this and use the scala.reflect.ClassTag$ like
  that :
  
   ClassTag$.MODULE$.apply(String.class)
  
   To get the second ClassTag parameter right, but then instantiating a
  java.util.Function or using the Java 8 lambdas fail to work, and if I try
  to instantiate a proper scala Function1... well this is a world of pain.
  
   This is a regression introduced by the 1.3.x DataFrame because
  JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are not
  callable with JFunctions), I can open a Jira if you want ?
  
   Regards,
  
   --
   *Olivier Girardot* | Associé
   o.girar...@lateral-thoughts.com
   +33 6 24 09 17 94
  
 



Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Ok, do you want me to open a pull request to fix the dedicated
documentation ?

Le ven. 17 avr. 2015 à 18:14, Reynold Xin r...@databricks.com a écrit :

 I think in 1.3 and above, you'd need to do

 .sql(...).javaRDD().map(..)

 On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

 Yes thanks !

 Le ven. 17 avr. 2015 à 16:20, Ted Yu yuzhih...@gmail.com a écrit :

  The image didn't go through.
 
  I think you were referring to:
override def map[R: ClassTag](f: Row = R): RDD[R] = rdd.map(f)
 
  Cheers
 
  On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot 
  o.girar...@lateral-thoughts.com wrote:
 
   Hi everyone,
   I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
   reproduce it in a small test case close to the actual documentation
   
 
 https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
  ,
   so sorry for the long mail, but this is Java :
  
   import org.apache.spark.api.java.JavaRDD;
   import org.apache.spark.api.java.JavaSparkContext;
   import org.apache.spark.sql.DataFrame;
   import org.apache.spark.sql.SQLContext;
  
   import java.io.Serializable;
   import java.util.ArrayList;
   import java.util.Arrays;
   import java.util.List;
  
   class Movie implements Serializable {
   private int id;
   private String name;
  
   public Movie(int id, String name) {
   this.id = id;
   this.name = name;
   }
  
   public int getId() {
   return id;
   }
  
   public void setId(int id) {
   this.id = id;
   }
  
   public String getName() {
   return name;
   }
  
   public void setName(String name) {
   this.name = name;
   }
   }
  
   public class SparkSQLTest {
   public static void main(String[] args) {
   SparkConf conf = new SparkConf();
   conf.setAppName(My Application);
   conf.setMaster(local);
   JavaSparkContext sc = new JavaSparkContext(conf);
  
   ArrayListMovie movieArrayList = new ArrayListMovie();
   movieArrayList.add(new Movie(1, Indiana Jones));
  
   JavaRDDMovie movies = sc.parallelize(movieArrayList);
  
   SQLContext sqlContext = new SQLContext(sc);
   DataFrame frame = sqlContext.applySchema(movies, Movie.class);
   frame.registerTempTable(movies);
  
   sqlContext.sql(select name from movies)
  
   *.map(row - row.getString(0)) // this is what i would
  expect to work *.collect();
   }
   }
  
  
   But this does not compile, here's the compilation error :
  
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
   method map in class org.apache.spark.sql.DataFrame cannot be applied
 to
   given types;
   [ERROR] *required:
   scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR
 *
   [ERROR]* found: (row)-Na[...]ng(0) *
   [ERROR] *reason: cannot infer type-variable(s) R *
   [ERROR] *(actual and formal argument lists differ in length) *
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
   method map in class org.apache.spark.sql.DataFrame cannot be applied
 to
   given types;
   [ERROR] required:
   scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR
   [ERROR] found: (row)-row[...]ng(0)
   [ERROR] reason: cannot infer type-variable(s) R
   [ERROR] (actual and formal argument lists differ in length)
   [ERROR] - [Help 1]
  
   Because in the DataFrame the *map *method is defined as :
  
   [image: Images intégrées 1]
  
   And once this is translated to bytecode the actual Java signature
 uses a
   Function1 and adds a ClassTag parameter.
   I can try to go around this and use the scala.reflect.ClassTag$ like
  that :
  
   ClassTag$.MODULE$.apply(String.class)
  
   To get the second ClassTag parameter right, but then instantiating a
  java.util.Function or using the Java 8 lambdas fail to work, and if I
 try
  to instantiate a proper scala Function1... well this is a world of pain.
  
   This is a regression introduced by the 1.3.x DataFrame because
  JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are
 not
  callable with JFunctions), I can open a Jira if you want ?
  
   Regards,
  
   --
   *Olivier Girardot* | Associé
   o.girar...@lateral-thoughts.com
   +33 6 24 09 17 94
  
 





Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Reynold Xin
Please do! Thanks.


On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot 
o.girar...@lateral-thoughts.com wrote:

 Ok, do you want me to open a pull request to fix the dedicated
 documentation ?

 Le ven. 17 avr. 2015 à 18:14, Reynold Xin r...@databricks.com a écrit :

 I think in 1.3 and above, you'd need to do

 .sql(...).javaRDD().map(..)

 On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

 Yes thanks !

 Le ven. 17 avr. 2015 à 16:20, Ted Yu yuzhih...@gmail.com a écrit :

  The image didn't go through.
 
  I think you were referring to:
override def map[R: ClassTag](f: Row = R): RDD[R] = rdd.map(f)
 
  Cheers
 
  On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot 
  o.girar...@lateral-thoughts.com wrote:
 
   Hi everyone,
   I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
   reproduce it in a small test case close to the actual documentation
   
 
 https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
  ,
   so sorry for the long mail, but this is Java :
  
   import org.apache.spark.api.java.JavaRDD;
   import org.apache.spark.api.java.JavaSparkContext;
   import org.apache.spark.sql.DataFrame;
   import org.apache.spark.sql.SQLContext;
  
   import java.io.Serializable;
   import java.util.ArrayList;
   import java.util.Arrays;
   import java.util.List;
  
   class Movie implements Serializable {
   private int id;
   private String name;
  
   public Movie(int id, String name) {
   this.id = id;
   this.name = name;
   }
  
   public int getId() {
   return id;
   }
  
   public void setId(int id) {
   this.id = id;
   }
  
   public String getName() {
   return name;
   }
  
   public void setName(String name) {
   this.name = name;
   }
   }
  
   public class SparkSQLTest {
   public static void main(String[] args) {
   SparkConf conf = new SparkConf();
   conf.setAppName(My Application);
   conf.setMaster(local);
   JavaSparkContext sc = new JavaSparkContext(conf);
  
   ArrayListMovie movieArrayList = new ArrayListMovie();
   movieArrayList.add(new Movie(1, Indiana Jones));
  
   JavaRDDMovie movies = sc.parallelize(movieArrayList);
  
   SQLContext sqlContext = new SQLContext(sc);
   DataFrame frame = sqlContext.applySchema(movies,
 Movie.class);
   frame.registerTempTable(movies);
  
   sqlContext.sql(select name from movies)
  
   *.map(row - row.getString(0)) // this is what i
 would
  expect to work *.collect();
   }
   }
  
  
   But this does not compile, here's the compilation error :
  
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
   method map in class org.apache.spark.sql.DataFrame cannot be applied
 to
   given types;
   [ERROR] *required:
  
 scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR *
   [ERROR]* found: (row)-Na[...]ng(0) *
   [ERROR] *reason: cannot infer type-variable(s) R *
   [ERROR] *(actual and formal argument lists differ in length) *
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
   method map in class org.apache.spark.sql.DataFrame cannot be applied
 to
   given types;
   [ERROR] required:
   scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR
   [ERROR] found: (row)-row[...]ng(0)
   [ERROR] reason: cannot infer type-variable(s) R
   [ERROR] (actual and formal argument lists differ in length)
   [ERROR] - [Help 1]
  
   Because in the DataFrame the *map *method is defined as :
  
   [image: Images intégrées 1]
  
   And once this is translated to bytecode the actual Java signature
 uses a
   Function1 and adds a ClassTag parameter.
   I can try to go around this and use the scala.reflect.ClassTag$ like
  that :
  
   ClassTag$.MODULE$.apply(String.class)
  
   To get the second ClassTag parameter right, but then instantiating a
  java.util.Function or using the Java 8 lambdas fail to work, and if I
 try
  to instantiate a proper scala Function1... well this is a world of
 pain.
  
   This is a regression introduced by the 1.3.x DataFrame because
  JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are
 not
  callable with JFunctions), I can open a Jira if you want ?
  
   Regards,
  
   --
   *Olivier Girardot* | Associé
   o.girar...@lateral-thoughts.com
   +33 6 24 09 17 94
  
 





BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-17 Thread Nipun Batra
Hi

The example given in SQL document
https://spark.apache.org/docs/latest/sql-programming-guide.html

org.apache.spark.sql.Row Does not exist in Java API or atleast I was not
able to find it.

Build Info - Downloaded from spark website

Dependency
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.3.0/version
scopeprovided/scope
/dependency

Code in documentation

// Import factory methods provided by DataType.import
org.apache.spark.sql.types.DataType;// Import StructType and
StructFieldimport org.apache.spark.sql.types.StructType;import
org.apache.spark.sql.types.StructField;// Import Row.import
org.apache.spark.sql.Row;
// sc is an existing JavaSparkContext.SQLContext sqlContext = new
org.apache.spark.sql.SQLContext(sc);
// Load a text file and convert each line to a
JavaBean.JavaRDDString people =
sc.textFile(examples/src/main/resources/people.txt);
// The schema is encoded in a stringString schemaString = name age;
// Generate the schema based on the string of schemaListStructField
fields = new ArrayListStructField();for (String fieldName:
schemaString.split( )) {
  fields.add(DataType.createStructField(fieldName,
DataType.StringType, true));}StructType schema =
DataType.createStructType(fields);
// Convert records of the RDD (people) to Rows.JavaRDDRow rowRDD = people.map(
  new FunctionString, Row() {
public Row call(String record) throws Exception {
  String[] fields = record.split(,);
  return Row.create(fields[0], fields[1].trim());
}
  });
// Apply the schema to the RDD.DataFrame peopleDataFrame =
sqlContext.createDataFrame(rowRDD, schema);
// Register the DataFrame as a
table.peopleDataFrame.registerTempTable(people);
// SQL can be run over RDDs that have been registered as
tables.DataFrame results = sqlContext.sql(SELECT name FROM people);
// The results of SQL queries are DataFrames and support all the
normal RDD operations.// The columns of a row in the result can be
accessed by ordinal.ListString names = results.map(new FunctionRow,
String() {
  public String call(Row row) {
return Name:  + row.getString(0);
  }

}).collect();


Thanks
Nipun


Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Reynold Xin
No there isn't a convention. Although if you want to show java 8, you
should also show java 6/7 syntax since there are still more 7 users than 8.


On Fri, Apr 17, 2015 at 3:36 PM, Olivier Girardot 
o.girar...@lateral-thoughts.com wrote:

 Is there any convention *not* to show java 8 versions in the documentation
 ?

 Le ven. 17 avr. 2015 à 21:39, Reynold Xin r...@databricks.com a écrit :

 Please do! Thanks.


 On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

 Ok, do you want me to open a pull request to fix the dedicated
 documentation ?

 Le ven. 17 avr. 2015 à 18:14, Reynold Xin r...@databricks.com a
 écrit :

 I think in 1.3 and above, you'd need to do

 .sql(...).javaRDD().map(..)

 On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

 Yes thanks !

 Le ven. 17 avr. 2015 à 16:20, Ted Yu yuzhih...@gmail.com a écrit :

  The image didn't go through.
 
  I think you were referring to:
override def map[R: ClassTag](f: Row = R): RDD[R] = rdd.map(f)
 
  Cheers
 
  On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot 
  o.girar...@lateral-thoughts.com wrote:
 
   Hi everyone,
   I had an issue trying to use Spark SQL from Java (8 or 7), I tried
 to
   reproduce it in a small test case close to the actual documentation
   
 
 https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
  ,
   so sorry for the long mail, but this is Java :
  
   import org.apache.spark.api.java.JavaRDD;
   import org.apache.spark.api.java.JavaSparkContext;
   import org.apache.spark.sql.DataFrame;
   import org.apache.spark.sql.SQLContext;
  
   import java.io.Serializable;
   import java.util.ArrayList;
   import java.util.Arrays;
   import java.util.List;
  
   class Movie implements Serializable {
   private int id;
   private String name;
  
   public Movie(int id, String name) {
   this.id = id;
   this.name = name;
   }
  
   public int getId() {
   return id;
   }
  
   public void setId(int id) {
   this.id = id;
   }
  
   public String getName() {
   return name;
   }
  
   public void setName(String name) {
   this.name = name;
   }
   }
  
   public class SparkSQLTest {
   public static void main(String[] args) {
   SparkConf conf = new SparkConf();
   conf.setAppName(My Application);
   conf.setMaster(local);
   JavaSparkContext sc = new JavaSparkContext(conf);
  
   ArrayListMovie movieArrayList = new ArrayListMovie();
   movieArrayList.add(new Movie(1, Indiana Jones));
  
   JavaRDDMovie movies = sc.parallelize(movieArrayList);
  
   SQLContext sqlContext = new SQLContext(sc);
   DataFrame frame = sqlContext.applySchema(movies,
 Movie.class);
   frame.registerTempTable(movies);
  
   sqlContext.sql(select name from movies)
  
   *.map(row - row.getString(0)) // this is what i
 would
  expect to work *.collect();
   }
   }
  
  
   But this does not compile, here's the compilation error :
  
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
   method map in class org.apache.spark.sql.DataFrame cannot be
 applied to
   given types;
   [ERROR] *required:
  
 scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR *
   [ERROR]* found: (row)-Na[...]ng(0) *
   [ERROR] *reason: cannot infer type-variable(s) R *
   [ERROR] *(actual and formal argument lists differ in length) *
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
   method map in class org.apache.spark.sql.DataFrame cannot be
 applied to
   given types;
   [ERROR] required:
  
 scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR
   [ERROR] found: (row)-row[...]ng(0)
   [ERROR] reason: cannot infer type-variable(s) R
   [ERROR] (actual and formal argument lists differ in length)
   [ERROR] - [Help 1]
  
   Because in the DataFrame the *map *method is defined as :
  
   [image: Images intégrées 1]
  
   And once this is translated to bytecode the actual Java signature
 uses a
   Function1 and adds a ClassTag parameter.
   I can try to go around this and use the scala.reflect.ClassTag$
 like
  that :
  
   ClassTag$.MODULE$.apply(String.class)
  
   To get the second ClassTag parameter right, but then instantiating
 a
  java.util.Function or using the Java 8 lambdas fail to work, and if
 I try
  to instantiate a proper scala Function1... well this is a world of
 pain.
  
   This is a regression introduced by the 1.3.x DataFrame because
  JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and
 are not
  callable with JFunctions), I can open a Jira if you want ?
  
   Regards,
  
   --
   *Olivier Girardot* | Associé
   o.girar...@lateral-thoughts.com
   

Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-17 Thread Olivier Girardot
Hi Nipun,
I'm sorry but I don't understand exactly what your problem is ?
Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL
dependency.
Is it a compilation problem ?
Are you trying to run a main method using the pom you've just described ?
or are you trying to spark-submit the jar ?
If you're trying to run a main method, the scope provided is not designed
for that and will make your program fail.

Regards,

Olivier.

Le ven. 17 avr. 2015 à 21:52, Nipun Batra bni...@gmail.com a écrit :

 Hi

 The example given in SQL document
 https://spark.apache.org/docs/latest/sql-programming-guide.html

 org.apache.spark.sql.Row Does not exist in Java API or atleast I was not
 able to find it.

 Build Info - Downloaded from spark website

 Dependency
 dependency
 groupIdorg.apache.spark/groupId
 artifactIdspark-sql_2.10/artifactId
 version1.3.0/version
 scopeprovided/scope
 /dependency

 Code in documentation

 // Import factory methods provided by DataType.import
 org.apache.spark.sql.types.DataType;// Import StructType and
 StructFieldimport org.apache.spark.sql.types.StructType;import
 org.apache.spark.sql.types.StructField;// Import Row.import
 org.apache.spark.sql.Row;
 // sc is an existing JavaSparkContext.SQLContext sqlContext = new
 org.apache.spark.sql.SQLContext(sc);
 // Load a text file and convert each line to a
 JavaBean.JavaRDDString people =
 sc.textFile(examples/src/main/resources/people.txt);
 // The schema is encoded in a stringString schemaString = name age;
 // Generate the schema based on the string of schemaListStructField
 fields = new ArrayListStructField();for (String fieldName:
 schemaString.split( )) {
   fields.add(DataType.createStructField(fieldName,
 DataType.StringType, true));}StructType schema =
 DataType.createStructType(fields);
 // Convert records of the RDD (people) to Rows.JavaRDDRow rowRDD =
 people.map(
   new FunctionString, Row() {
 public Row call(String record) throws Exception {
   String[] fields = record.split(,);
   return Row.create(fields[0], fields[1].trim());
 }
   });
 // Apply the schema to the RDD.DataFrame peopleDataFrame =
 sqlContext.createDataFrame(rowRDD, schema);
 // Register the DataFrame as a
 table.peopleDataFrame.registerTempTable(people);
 // SQL can be run over RDDs that have been registered as
 tables.DataFrame results = sqlContext.sql(SELECT name FROM people);
 // The results of SQL queries are DataFrames and support all the
 normal RDD operations.// The columns of a row in the result can be
 accessed by ordinal.ListString names = results.map(new FunctionRow,
 String() {
   public String call(Row row) {
 return Name:  + row.getString(0);
   }

 }).collect();


 Thanks
 Nipun



Re: Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

2015-04-17 Thread Reynold Xin
It's because you did a repartition -- which rearranges all the data.

Parquet uses all kinds of compression techniques such as dictionary
encoding and run-length encoding, which would result in the size difference
when the data is ordered different.

On Fri, Apr 17, 2015 at 4:51 AM, zhangxiongfei zhangxiongfei0...@163.com
wrote:

 Hi,
 I did some tests on Parquet Files with Spark SQL DataFrame API.
 I generated 36 gzip compressed parquet files by Spark SQL and stored them
 on Tachyon,The size of each file is about  222M.Then read them with below
 code.
 val tfs
 =sqlContext.parquetFile(tachyon://datanode8.bitauto.dmp:19998/apps/tachyon/adClick);
 Next,I just save this DataFrame onto HDFS with below code.It will generate
 36 parquet files too,but the size of each file is about 265M

 tfs.repartition(36).saveAsParquetFile(/user/zhangxf/adClick-parquet-tachyon);
 My question is Why the files on HDFS has different size with those on
 Tachyon even though they come from the same original data?


 Thanks
 Zhang Xiongfei




Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
another PR I guess :) here's the associated Jira
https://issues.apache.org/jira/browse/SPARK-6988

Le ven. 17 avr. 2015 à 23:00, Reynold Xin r...@databricks.com a écrit :

 No there isn't a convention. Although if you want to show java 8, you
 should also show java 6/7 syntax since there are still more 7 users than 8.


 On Fri, Apr 17, 2015 at 3:36 PM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

 Is there any convention *not* to show java 8 versions in the
 documentation ?

 Le ven. 17 avr. 2015 à 21:39, Reynold Xin r...@databricks.com a écrit :

 Please do! Thanks.


 On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

 Ok, do you want me to open a pull request to fix the dedicated
 documentation ?

 Le ven. 17 avr. 2015 à 18:14, Reynold Xin r...@databricks.com a
 écrit :

 I think in 1.3 and above, you'd need to do

 .sql(...).javaRDD().map(..)

 On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

 Yes thanks !

 Le ven. 17 avr. 2015 à 16:20, Ted Yu yuzhih...@gmail.com a écrit :

  The image didn't go through.
 
  I think you were referring to:
override def map[R: ClassTag](f: Row = R): RDD[R] = rdd.map(f)
 
  Cheers
 
  On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot 
  o.girar...@lateral-thoughts.com wrote:
 
   Hi everyone,
   I had an issue trying to use Spark SQL from Java (8 or 7), I
 tried to
   reproduce it in a small test case close to the actual
 documentation
   
 
 https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
  ,
   so sorry for the long mail, but this is Java :
  
   import org.apache.spark.api.java.JavaRDD;
   import org.apache.spark.api.java.JavaSparkContext;
   import org.apache.spark.sql.DataFrame;
   import org.apache.spark.sql.SQLContext;
  
   import java.io.Serializable;
   import java.util.ArrayList;
   import java.util.Arrays;
   import java.util.List;
  
   class Movie implements Serializable {
   private int id;
   private String name;
  
   public Movie(int id, String name) {
   this.id = id;
   this.name = name;
   }
  
   public int getId() {
   return id;
   }
  
   public void setId(int id) {
   this.id = id;
   }
  
   public String getName() {
   return name;
   }
  
   public void setName(String name) {
   this.name = name;
   }
   }
  
   public class SparkSQLTest {
   public static void main(String[] args) {
   SparkConf conf = new SparkConf();
   conf.setAppName(My Application);
   conf.setMaster(local);
   JavaSparkContext sc = new JavaSparkContext(conf);
  
   ArrayListMovie movieArrayList = new ArrayListMovie();
   movieArrayList.add(new Movie(1, Indiana Jones));
  
   JavaRDDMovie movies = sc.parallelize(movieArrayList);
  
   SQLContext sqlContext = new SQLContext(sc);
   DataFrame frame = sqlContext.applySchema(movies,
 Movie.class);
   frame.registerTempTable(movies);
  
   sqlContext.sql(select name from movies)
  
   *.map(row - row.getString(0)) // this is what i
 would
  expect to work *.collect();
   }
   }
  
  
   But this does not compile, here's the compilation error :
  
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
   method map in class org.apache.spark.sql.DataFrame cannot be
 applied to
   given types;
   [ERROR] *required:
  
 scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR *
   [ERROR]* found: (row)-Na[...]ng(0) *
   [ERROR] *reason: cannot infer type-variable(s) R *
   [ERROR] *(actual and formal argument lists differ in length) *
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
   method map in class org.apache.spark.sql.DataFrame cannot be
 applied to
   given types;
   [ERROR] required:
  
 scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR
   [ERROR] found: (row)-row[...]ng(0)
   [ERROR] reason: cannot infer type-variable(s) R
   [ERROR] (actual and formal argument lists differ in length)
   [ERROR] - [Help 1]
  
   Because in the DataFrame the *map *method is defined as :
  
   [image: Images intégrées 1]
  
   And once this is translated to bytecode the actual Java signature
 uses a
   Function1 and adds a ClassTag parameter.
   I can try to go around this and use the scala.reflect.ClassTag$
 like
  that :
  
   ClassTag$.MODULE$.apply(String.class)
  
   To get the second ClassTag parameter right, but then
 instantiating a
  java.util.Function or using the Java 8 lambdas fail to work, and if
 I try
  to instantiate a proper scala Function1... well this is a world of
 pain.
  
   This is a regression introduced by the 1.3.x DataFrame because
  JavaSchemaRDD used to be JavaRDDLike but DataFrame's 

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
and the PR: https://github.com/apache/spark/pull/5564

Thank you !

Olivier.

Le ven. 17 avr. 2015 à 23:00, Reynold Xin r...@databricks.com a écrit :

 No there isn't a convention. Although if you want to show java 8, you
 should also show java 6/7 syntax since there are still more 7 users than 8.


 On Fri, Apr 17, 2015 at 3:36 PM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

 Is there any convention *not* to show java 8 versions in the
 documentation ?

 Le ven. 17 avr. 2015 à 21:39, Reynold Xin r...@databricks.com a écrit :

 Please do! Thanks.


 On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

 Ok, do you want me to open a pull request to fix the dedicated
 documentation ?

 Le ven. 17 avr. 2015 à 18:14, Reynold Xin r...@databricks.com a
 écrit :

 I think in 1.3 and above, you'd need to do

 .sql(...).javaRDD().map(..)

 On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

 Yes thanks !

 Le ven. 17 avr. 2015 à 16:20, Ted Yu yuzhih...@gmail.com a écrit :

  The image didn't go through.
 
  I think you were referring to:
override def map[R: ClassTag](f: Row = R): RDD[R] = rdd.map(f)
 
  Cheers
 
  On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot 
  o.girar...@lateral-thoughts.com wrote:
 
   Hi everyone,
   I had an issue trying to use Spark SQL from Java (8 or 7), I
 tried to
   reproduce it in a small test case close to the actual
 documentation
   
 
 https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
  ,
   so sorry for the long mail, but this is Java :
  
   import org.apache.spark.api.java.JavaRDD;
   import org.apache.spark.api.java.JavaSparkContext;
   import org.apache.spark.sql.DataFrame;
   import org.apache.spark.sql.SQLContext;
  
   import java.io.Serializable;
   import java.util.ArrayList;
   import java.util.Arrays;
   import java.util.List;
  
   class Movie implements Serializable {
   private int id;
   private String name;
  
   public Movie(int id, String name) {
   this.id = id;
   this.name = name;
   }
  
   public int getId() {
   return id;
   }
  
   public void setId(int id) {
   this.id = id;
   }
  
   public String getName() {
   return name;
   }
  
   public void setName(String name) {
   this.name = name;
   }
   }
  
   public class SparkSQLTest {
   public static void main(String[] args) {
   SparkConf conf = new SparkConf();
   conf.setAppName(My Application);
   conf.setMaster(local);
   JavaSparkContext sc = new JavaSparkContext(conf);
  
   ArrayListMovie movieArrayList = new ArrayListMovie();
   movieArrayList.add(new Movie(1, Indiana Jones));
  
   JavaRDDMovie movies = sc.parallelize(movieArrayList);
  
   SQLContext sqlContext = new SQLContext(sc);
   DataFrame frame = sqlContext.applySchema(movies,
 Movie.class);
   frame.registerTempTable(movies);
  
   sqlContext.sql(select name from movies)
  
   *.map(row - row.getString(0)) // this is what i
 would
  expect to work *.collect();
   }
   }
  
  
   But this does not compile, here's the compilation error :
  
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
   method map in class org.apache.spark.sql.DataFrame cannot be
 applied to
   given types;
   [ERROR] *required:
  
 scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR *
   [ERROR]* found: (row)-Na[...]ng(0) *
   [ERROR] *reason: cannot infer type-variable(s) R *
   [ERROR] *(actual and formal argument lists differ in length) *
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
   method map in class org.apache.spark.sql.DataFrame cannot be
 applied to
   given types;
   [ERROR] required:
  
 scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR
   [ERROR] found: (row)-row[...]ng(0)
   [ERROR] reason: cannot infer type-variable(s) R
   [ERROR] (actual and formal argument lists differ in length)
   [ERROR] - [Help 1]
  
   Because in the DataFrame the *map *method is defined as :
  
   [image: Images intégrées 1]
  
   And once this is translated to bytecode the actual Java signature
 uses a
   Function1 and adds a ClassTag parameter.
   I can try to go around this and use the scala.reflect.ClassTag$
 like
  that :
  
   ClassTag$.MODULE$.apply(String.class)
  
   To get the second ClassTag parameter right, but then
 instantiating a
  java.util.Function or using the Java 8 lambdas fail to work, and if
 I try
  to instantiate a proper scala Function1... well this is a world of
 pain.
  
   This is a regression introduced by the 1.3.x DataFrame because
  JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and
 are 

Re: dataframe can not find fields after loading from hive

2015-04-17 Thread Reynold Xin
This is strange. cc the dev list since it might be a bug.



On Thu, Apr 16, 2015 at 3:18 PM, Cesar Flores ces...@gmail.com wrote:

 Never mind. I found the solution:

 val newDataFrame = hc.createDataFrame(hiveLoadedDataFrame.rdd,
 hiveLoadedDataFrame.schema)

 which translate to convert the data frame to rdd and back again to data
 frame. Not the prettiest solution, but at least it solves my problems.


 Thanks,
 Cesar Flores



 On Thu, Apr 16, 2015 at 11:17 AM, Cesar Flores ces...@gmail.com wrote:


 I have a data frame in which I load data from a hive table. And my issue
 is that the data frame is missing the columns that I need to query.

 For example:

 val newdataset = dataset.where(dataset(label) === 1)

 gives me an error like the following:

 ERROR yarn.ApplicationMaster: User class threw exception: resolved
 attributes label missing from label, user_id, ...(the rest of the fields of
 my table
 org.apache.spark.sql.AnalysisException: resolved attributes label missing
 from label, user_id, ... (the rest of the fields of my table)

 where we can see that the label field actually exist. I manage to solve
 this issue by updating my syntax to:

 val newdataset = dataset.where($label === 1)

 which works. However I can not make this trick in all my queries. For
 example, when I try to do a unionAll from two subsets of the same data
 frame the error I am getting is that all my fields are missing.

 Can someone tell me if I need to do some post processing after loading
 from hive in order to avoid this kind of errors?


 Thanks
 --
 Cesar Flores




 --
 Cesar Flores



Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Is there any convention *not* to show java 8 versions in the documentation ?

Le ven. 17 avr. 2015 à 21:39, Reynold Xin r...@databricks.com a écrit :

 Please do! Thanks.


 On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

 Ok, do you want me to open a pull request to fix the dedicated
 documentation ?

 Le ven. 17 avr. 2015 à 18:14, Reynold Xin r...@databricks.com a écrit :

 I think in 1.3 and above, you'd need to do

 .sql(...).javaRDD().map(..)

 On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot 
 o.girar...@lateral-thoughts.com wrote:

 Yes thanks !

 Le ven. 17 avr. 2015 à 16:20, Ted Yu yuzhih...@gmail.com a écrit :

  The image didn't go through.
 
  I think you were referring to:
override def map[R: ClassTag](f: Row = R): RDD[R] = rdd.map(f)
 
  Cheers
 
  On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot 
  o.girar...@lateral-thoughts.com wrote:
 
   Hi everyone,
   I had an issue trying to use Spark SQL from Java (8 or 7), I tried
 to
   reproduce it in a small test case close to the actual documentation
   
 
 https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
  ,
   so sorry for the long mail, but this is Java :
  
   import org.apache.spark.api.java.JavaRDD;
   import org.apache.spark.api.java.JavaSparkContext;
   import org.apache.spark.sql.DataFrame;
   import org.apache.spark.sql.SQLContext;
  
   import java.io.Serializable;
   import java.util.ArrayList;
   import java.util.Arrays;
   import java.util.List;
  
   class Movie implements Serializable {
   private int id;
   private String name;
  
   public Movie(int id, String name) {
   this.id = id;
   this.name = name;
   }
  
   public int getId() {
   return id;
   }
  
   public void setId(int id) {
   this.id = id;
   }
  
   public String getName() {
   return name;
   }
  
   public void setName(String name) {
   this.name = name;
   }
   }
  
   public class SparkSQLTest {
   public static void main(String[] args) {
   SparkConf conf = new SparkConf();
   conf.setAppName(My Application);
   conf.setMaster(local);
   JavaSparkContext sc = new JavaSparkContext(conf);
  
   ArrayListMovie movieArrayList = new ArrayListMovie();
   movieArrayList.add(new Movie(1, Indiana Jones));
  
   JavaRDDMovie movies = sc.parallelize(movieArrayList);
  
   SQLContext sqlContext = new SQLContext(sc);
   DataFrame frame = sqlContext.applySchema(movies,
 Movie.class);
   frame.registerTempTable(movies);
  
   sqlContext.sql(select name from movies)
  
   *.map(row - row.getString(0)) // this is what i
 would
  expect to work *.collect();
   }
   }
  
  
   But this does not compile, here's the compilation error :
  
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
   method map in class org.apache.spark.sql.DataFrame cannot be
 applied to
   given types;
   [ERROR] *required:
  
 scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR *
   [ERROR]* found: (row)-Na[...]ng(0) *
   [ERROR] *reason: cannot infer type-variable(s) R *
   [ERROR] *(actual and formal argument lists differ in length) *
   [ERROR]
  
 
 /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
   method map in class org.apache.spark.sql.DataFrame cannot be
 applied to
   given types;
   [ERROR] required:
  
 scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR
   [ERROR] found: (row)-row[...]ng(0)
   [ERROR] reason: cannot infer type-variable(s) R
   [ERROR] (actual and formal argument lists differ in length)
   [ERROR] - [Help 1]
  
   Because in the DataFrame the *map *method is defined as :
  
   [image: Images intégrées 1]
  
   And once this is translated to bytecode the actual Java signature
 uses a
   Function1 and adds a ClassTag parameter.
   I can try to go around this and use the scala.reflect.ClassTag$ like
  that :
  
   ClassTag$.MODULE$.apply(String.class)
  
   To get the second ClassTag parameter right, but then instantiating a
  java.util.Function or using the Java 8 lambdas fail to work, and if I
 try
  to instantiate a proper scala Function1... well this is a world of
 pain.
  
   This is a regression introduced by the 1.3.x DataFrame because
  JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are
 not
  callable with JFunctions), I can open a Jira if you want ?
  
   Regards,
  
   --
   *Olivier Girardot* | Associé
   o.girar...@lateral-thoughts.com
   +33 6 24 09 17 94
  
 






Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sree V
cleaned up ~/.m2 and ~/.zinc.
received exact same error, again. So, -1 from me.

[INFO] 
[INFO] Building Spark Project External Flume 1.2.2
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ 
spark-streaming-flume_2.10 ---
[INFO] Deleting /root/sources/github/spark/external/flume/target
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-versions) @ 
spark-streaming-flume_2.10 ---
[INFO] 
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-scala-sources) @ 
spark-streaming-flume_2.10 ---
[INFO] Source directory: 
/root/sources/github/spark/external/flume/src/main/scala added.
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
spark-streaming-flume_2.10 ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
spark-streaming-flume_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/root/sources/github/spark/external/flume/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ 
spark-streaming-flume_2.10 ---
[WARNING] Zinc server is not available at port 3030 - reverting to normal 
incremental compile
[INFO] Using incremental compilation
[INFO] compiler plugin: 
BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
[INFO] Compiling 6 Scala sources and 1 Java source to 
/root/sources/github/spark/external/flume/target/scala-2.10/classes...
[ERROR] 
/root/sources/github/spark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeBatchFetcher.scala:22:
 object Throwables is not a member of package com.google.common.base
[ERROR] import com.google.common.base.Throwables
[ERROR]    ^
[ERROR] 
/root/sources/github/spark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeBatchFetcher.scala:59:
 not found: value Throwables
[ERROR]   Throwables.getRootCause(e) match {
[ERROR]   ^
[ERROR] 
/root/sources/github/spark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumePollingInputDStream.scala:26:
 object util is not a member of package com.google.common
[ERROR] import com.google.common.util.concurrent.ThreadFactoryBuilder
[ERROR]  ^
[ERROR] 
/root/sources/github/spark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumePollingInputDStream.scala:69:
 not found: type ThreadFactoryBuilder
[ERROR] Executors.newCachedThreadPool(new 
ThreadFactoryBuilder().setDaemon(true).
[ERROR]   ^
[ERROR] 
/root/sources/github/spark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumePollingInputDStream.scala:76:
 not found: type ThreadFactoryBuilder
[ERROR] new ThreadFactoryBuilder().setDaemon(true).setNameFormat(Flume 
Receiver Thread - %d).build())
[ERROR] ^
[ERROR] 5 errors found
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM ... SUCCESS [01:44 min]
[INFO] Spark Project Networking ... SUCCESS [ 49.128 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  8.503 s]
[INFO] Spark Project Core . SUCCESS [05:22 min]
[INFO] Spark Project Bagel  SUCCESS [ 25.647 s]
[INFO] Spark Project GraphX ... SUCCESS [01:13 min]
[INFO] Spark Project Streaming  SUCCESS [01:29 min]
[INFO] Spark Project Catalyst . SUCCESS [01:51 min]
[INFO] Spark Project SQL .. SUCCESS [01:57 min]
[INFO] Spark Project ML Library ... SUCCESS [02:25 min]
[INFO] Spark Project Tools  SUCCESS [ 16.665 s]
[INFO] Spark Project Hive . SUCCESS [02:03 min]
[INFO] Spark Project REPL . SUCCESS [ 50.294 s]
[INFO] Spark Project YARN Parent POM .. SUCCESS [  5.777 s]
[INFO] Spark Project YARN Stable API .. SUCCESS [ 53.803 s]
[INFO] Spark Project Assembly . SUCCESS [ 59.515 s]
[INFO] Spark Project External Twitter . SUCCESS [ 40.038 s]
[INFO] Spark Project External Flume Sink .. SUCCESS [ 32.779 s]
[INFO] Spark Project External Flume ... FAILURE [  7.936 s]
[INFO] Spark Project External MQTT  SKIPPED
[INFO] Spark Project External ZeroMQ .. SKIPPED
[INFO] Spark Project External Kafka ... SKIPPED
[INFO] Spark Project Examples . SKIPPED
[INFO] Spark Project YARN Shuffle Service 

Announcing Spark 1.3.1 and 1.2.2

2015-04-17 Thread Patrick Wendell
Hi All,

I'm happy to announce the Spark 1.3.1 and 1.2.2 maintenance releases.
We recommend all users on the 1.3 and 1.2 Spark branches upgrade to
these releases, which contain several important bug fixes.

Download Spark 1.3.1 or 1.2.2:
http://spark.apache.org/downloads.html

Release notes:
1.3.1: http://spark.apache.org/releases/spark-release-1-3-1.html
1.2.2:  http://spark.apache.org/releases/spark-release-1-2-2.html

Comprehensive list of fixes:
1.3.1: http://s.apache.org/spark-1.3.1
1.2.2: http://s.apache.org/spark-1.2.2

Thanks to everyone who worked on these releases!

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sree V
Hi Sean,
This is from build log.  I made a master branch build earlier on this 
machine.Do you think, it needs a clean up of .m2 folder, that you suggested in 
onetime earlier ?Giving it another try, while you take a look at this.

[INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ 
spark-streaming-flume_2.10 ---
[WARNING] Zinc server is not available at port 3030 - reverting to normal 
incremental compile
[INFO] Using incremental compilation
[INFO] compiler plugin: 
BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
[INFO] Compiling 6 Scala sources and 1 Java source to 
/root/sources/github/spark/external/flume/target/scala-2
.10/classes...
[ERROR] 
/root/sources/github/spark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeBatchFe
tcher.scala:22: object Throwables is not a member of package 
com.google.common.base
[ERROR] import com.google.common.base.Throwables
[ERROR]    ^
[ERROR] 
/root/sources/github/spark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeBatchFe
tcher.scala:59: not found: value Throwables
[ERROR]   Throwables.getRootCause(e) match {
[ERROR]   ^
[ERROR] 
/root/sources/github/spark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumePolling
InputDStream.scala:26: object util is not a member of package com.google.common
[ERROR] import com.google.common.util.concurrent.ThreadFactoryBuilder
[ERROR]  ^
[ERROR] 
/root/sources/github/spark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumePolling
InputDStream.scala:69: not found: type ThreadFactoryBuilder
[ERROR] Executors.newCachedThreadPool(new 
ThreadFactoryBuilder().setDaemon(true).
[ERROR]   ^
[ERROR] 
/root/sources/github/spark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumePolling
InputDStream.scala:76: not found: type ThreadFactoryBuilder
[ERROR] new ThreadFactoryBuilder().setDaemon(true).setNameFormat(Flume 
Receiver Thread - %d).build())
[ERROR] ^
[ERROR] 5 errors found
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM ... SUCCESS [ 15.894 s]
[INFO] Spark Project Networking ... SUCCESS [ 20.801 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [ 18.111 s]
[INFO] Spark Project Core . SUCCESS [08:09 min]
[INFO] Spark Project Bagel  SUCCESS [ 43.592 s]
[INFO] Spark Project GraphX ... SUCCESS [01:55 min]
[INFO] Spark Project Streaming  SUCCESS [03:02 min]
[INFO] Spark Project Catalyst . SUCCESS [02:59 min]
[INFO] Spark Project SQL .. SUCCESS [03:09 min]
[INFO] Spark Project ML Library ... SUCCESS [03:24 min]
[INFO] Spark Project Tools  SUCCESS [ 24.816 s]
[INFO] Spark Project Hive . SUCCESS [02:14 min]
[INFO] Spark Project REPL . SUCCESS [01:12 min]
[INFO] Spark Project YARN Parent POM .. SUCCESS [  6.080 s]
[INFO] Spark Project YARN Stable API .. SUCCESS [01:27 min]
[INFO] Spark Project Assembly . SUCCESS [01:22 min]
[INFO] Spark Project External Twitter . SUCCESS [ 35.881 s]
[INFO] Spark Project External Flume Sink .. SUCCESS [ 39.561 s]
[INFO] Spark Project External Flume ... FAILURE [ 11.212 s]
[INFO] Spark Project External MQTT  SKIPPED
[INFO] Spark Project External ZeroMQ .. SKIPPED
[INFO] Spark Project External Kafka ... SKIPPED
[INFO] Spark Project Examples . SKIPPED
[INFO] Spark Project YARN Shuffle Service . SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 32:36 min
[INFO] Finished at: 2015-04-16T23:02:18-07:00
[INFO] Final Memory: 91M/2043M
[INFO] 
[ERROR] Failed to execute goal 
net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile-first) on pr
oject spark-streaming-flume_2.10: Execution scala-compile-first of goal 
net.alchim31.maven:scala-maven-plugin:
3.2.0:compile failed. CompileFailed - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 

Re: Spark streaming vs. spark usage

2015-04-17 Thread Nathan Kronenfeld
I finally got this compiling and working, I think, but since (as Reynold
points out) it involves a little API refactoring, I was hoping to get some
discussion about it going as soon as possible.

I have the changes necessary to give RDD, DStream, and DataFrame some level
of common interface, in https://github.com/apache/spark/pull/5565, and
would very much appreciate comments.

Thanks,
Nathan

On Thu, Dec 19, 2013 at 12:42 AM, Reynold Xin r...@apache.org wrote:


 On Wed, Dec 18, 2013 at 12:17 PM, Nathan Kronenfeld 
 nkronenf...@oculusinfo.com wrote:



 Since many of the functions exist in parallel between the two, I guess I
 would expect something like:

 trait BasicRDDFunctions {
 def map...
 def reduce...
 def filter...
 def foreach...
 }

 class RDD extends  BasicRDDFunctions...
 class DStream extends BasicRDDFunctions...


 I like this idea. We should discuss more about it on the dev list. It
 would require refactoring some APIs, but does lead to better unification.