Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sree V
Sorry, I couldn't catch up before closing the voting.If it still counts, mvn package fails (1).  And didn't run test (2).  So, -1.1.mvn -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.6.0 -DskipTests clean package 2. mvn -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.6.0

Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sean Owen
Sree that doesn't show any error, so it doesn't help. I built with the same flags when I tested and it succeeded. On Fri, Apr 17, 2015 at 8:53 AM, Sree V sree_at_ch...@yahoo.com.invalid wrote: Sorry, I couldn't catch up before closing the voting.If it still counts, mvn package fails (1). And

Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

2015-04-17 Thread zhangxiongfei
Hi, I did some tests on Parquet Files with Spark SQL DataFrame API. I generated 36 gzip compressed parquet files by Spark SQL and stored them on Tachyon,The size of each file is about 222M.Then read them with below code. val tfs

Re: Gitter chat room for Spark

2015-04-17 Thread Sean Owen
There are N chat options out there, and of course there's no need or way to stop people from using them. If 1 is blessed as 'best', it excludes others who prefer a different one. Tomorrow there will be a New Best Chat App. If a bunch are blessed, the conversation fractures. There's also a

Addition of new Metrics for killed executors.

2015-04-17 Thread Archit Thakur
Hi, We are planning to add new Metrics in Spark for the executors that got killed during the execution. Was just curious, why this info is not already present. Is there some reason for not adding it.? Any ideas around are welcome. Thanks and Regards, Archit Thakur.

[Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Hi everyone, I had an issue trying to use Spark SQL from Java (8 or 7), I tried to reproduce it in a small test case close to the actual documentation https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection, so sorry for the long mail, but this is Java

Fwd: Addition of new Metrics for killed executors.

2015-04-17 Thread Archit Thakur
-- Forwarded message -- From: Archit Thakur archit279tha...@gmail.com Date: Fri, Apr 17, 2015 at 4:07 PM Subject: Addition of new Metrics for killed executors. To: u...@spark.incubator.apache.org, u...@spark.apache.org, d...@spark.incubator.apache.org Hi, We are planning to add

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Yes thanks ! Le ven. 17 avr. 2015 à 16:20, Ted Yu yuzhih...@gmail.com a écrit : The image didn't go through. I think you were referring to: override def map[R: ClassTag](f: Row = R): RDD[R] = rdd.map(f) Cheers On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Ted Yu
The image didn't go through. I think you were referring to: override def map[R: ClassTag](f: Row = R): RDD[R] = rdd.map(f) Cheers On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Hi everyone, I had an issue trying to use Spark SQL from Java (8 or

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Reynold Xin
I think in 1.3 and above, you'd need to do .sql(...).javaRDD().map(..) On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Yes thanks ! Le ven. 17 avr. 2015 à 16:20, Ted Yu yuzhih...@gmail.com a écrit : The image didn't go through. I think you

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Ok, do you want me to open a pull request to fix the dedicated documentation ? Le ven. 17 avr. 2015 à 18:14, Reynold Xin r...@databricks.com a écrit : I think in 1.3 and above, you'd need to do .sql(...).javaRDD().map(..) On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Reynold Xin
Please do! Thanks. On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Ok, do you want me to open a pull request to fix the dedicated documentation ? Le ven. 17 avr. 2015 à 18:14, Reynold Xin r...@databricks.com a écrit : I think in 1.3 and above,

BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-17 Thread Nipun Batra
Hi The example given in SQL document https://spark.apache.org/docs/latest/sql-programming-guide.html org.apache.spark.sql.Row Does not exist in Java API or atleast I was not able to find it. Build Info - Downloaded from spark website Dependency dependency

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Reynold Xin
No there isn't a convention. Although if you want to show java 8, you should also show java 6/7 syntax since there are still more 7 users than 8. On Fri, Apr 17, 2015 at 3:36 PM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Is there any convention *not* to show java 8 versions in

Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-17 Thread Olivier Girardot
Hi Nipun, I'm sorry but I don't understand exactly what your problem is ? Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL dependency. Is it a compilation problem ? Are you trying to run a main method using the pom you've just described ? or are you trying to spark-submit

Re: Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

2015-04-17 Thread Reynold Xin
It's because you did a repartition -- which rearranges all the data. Parquet uses all kinds of compression techniques such as dictionary encoding and run-length encoding, which would result in the size difference when the data is ordered different. On Fri, Apr 17, 2015 at 4:51 AM, zhangxiongfei

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
another PR I guess :) here's the associated Jira https://issues.apache.org/jira/browse/SPARK-6988 Le ven. 17 avr. 2015 à 23:00, Reynold Xin r...@databricks.com a écrit : No there isn't a convention. Although if you want to show java 8, you should also show java 6/7 syntax since there are still

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
and the PR: https://github.com/apache/spark/pull/5564 Thank you ! Olivier. Le ven. 17 avr. 2015 à 23:00, Reynold Xin r...@databricks.com a écrit : No there isn't a convention. Although if you want to show java 8, you should also show java 6/7 syntax since there are still more 7 users than 8.

Re: dataframe can not find fields after loading from hive

2015-04-17 Thread Reynold Xin
This is strange. cc the dev list since it might be a bug. On Thu, Apr 16, 2015 at 3:18 PM, Cesar Flores ces...@gmail.com wrote: Never mind. I found the solution: val newDataFrame = hc.createDataFrame(hiveLoadedDataFrame.rdd, hiveLoadedDataFrame.schema) which translate to convert the data

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Is there any convention *not* to show java 8 versions in the documentation ? Le ven. 17 avr. 2015 à 21:39, Reynold Xin r...@databricks.com a écrit : Please do! Thanks. On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Ok, do you want me to open a

Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sree V
cleaned up ~/.m2 and ~/.zinc. received exact same error, again. So, -1 from me. [INFO] [INFO] Building Spark Project External Flume 1.2.2 [INFO] [INFO]

Announcing Spark 1.3.1 and 1.2.2

2015-04-17 Thread Patrick Wendell
Hi All, I'm happy to announce the Spark 1.3.1 and 1.2.2 maintenance releases. We recommend all users on the 1.3 and 1.2 Spark branches upgrade to these releases, which contain several important bug fixes. Download Spark 1.3.1 or 1.2.2: http://spark.apache.org/downloads.html Release notes:

Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sree V
Hi Sean, This is from build log.  I made a master branch build earlier on this machine.Do you think, it needs a clean up of .m2 folder, that you suggested in onetime earlier ?Giving it another try, while you take a look at this. [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first)

Re: Spark streaming vs. spark usage

2015-04-17 Thread Nathan Kronenfeld
I finally got this compiling and working, I think, but since (as Reynold points out) it involves a little API refactoring, I was hoping to get some discussion about it going as soon as possible. I have the changes necessary to give RDD, DStream, and DataFrame some level of common interface, in