Re: 答复: How can I get the column data based on specific column name and then stored these data in array or list ?

2015-12-25 Thread Yanbo Liang
Actually you can call df.collect_list("a"). 2015-12-25 16:00 GMT+08:00 Jeff Zhang : > You can use udf to convert one column for array type. Here's one sample > > val conf = new SparkConf().setMaster("local[4]").setAppName("test") > val sc = new SparkContext(conf) > val

答复: 答复: How can I get the column data based on specific column name and then stored these data in array or list ?

2015-12-25 Thread zml张明磊
Yes. It’s a good method . But UDF ? What is UDF ? U…..D…F ? OK, I can learn from it. Thanks, Minglei. 发件人: Jeff Zhang [mailto:zjf...@gmail.com] 发送时间: 2015年12月25日 16:00 收件人: zml张明磊 抄送: dev@spark.apache.org 主题: Re: 答复: How can I get the column data based on specific column name and

答复: 答复: How can I get the column data based on specific column name and then stored these data in array or list ?

2015-12-25 Thread zml张明磊
咦 ??? I will have a try. Thanks, Minglei. 发件人: Yanbo Liang [mailto:yblia...@gmail.com] 发送时间: 2015年12月25日 16:07 收件人: Jeff Zhang 抄送: zml张明磊; dev@spark.apache.org 主题: Re: 答复: How can I get the column data based on specific column name and then stored these data in array or list ? Actually you can

Re: A proposal for Spark 2.0

2015-12-25 Thread Tao Wang
How about the Hive dependency? We use ThriftServer, serdes and even the parser/execute logic in Hive. Where will we direct about this part? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/A-proposal-for-Spark-2-0-tp15122p15793.html Sent from the

Re: latest Spark build error

2015-12-25 Thread salexln
One more question: Is there a way only to build the MLlib using command line? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/latest-Spark-build-error-tp15782p15794.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: 答复: How can I get the column data based on specific column name and then stored these data in array or list ?

2015-12-25 Thread Jeff Zhang
You can use udf to convert one column for array type. Here's one sample val conf = new SparkConf().setMaster("local[4]").setAppName("test") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) import sqlContext.implicits._ import sqlContext._ sqlContext.udf.register("f", (a:String)

Re: latest Spark build error

2015-12-25 Thread Allen Zhang
Try -pl option in mvn command, and append -am or amd for more choice. for instance: mvn clean install -pl :spark-mllib_2.10 -DskipTests At 2015-12-25 17:57:41, "salexln" wrote: >One more question: >Is there a way only to build the MLlib using command line? > > > > >--

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-25 Thread vaquar khan
+1 On 24 Dec 2015 22:01, "Vinay Shukla" wrote: > +1 > Tested on HDP 2.3, YARN cluster mode, spark-shell > > On Wed, Dec 23, 2015 at 6:14 AM, Allen Zhang > wrote: > >> >> +1 (non-binding) >> >> I have just tarball a new binary and tested

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-25 Thread Bhupendra Mishra
+1 On Fri, Dec 25, 2015 at 8:31 PM, vaquar khan wrote: > +1 > On 24 Dec 2015 22:01, "Vinay Shukla" wrote: > >> +1 >> Tested on HDP 2.3, YARN cluster mode, spark-shell >> >> On Wed, Dec 23, 2015 at 6:14 AM, Allen Zhang >>

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-25 Thread Ricardo Almeida
+1 (non binding) Tested Python API, Spark Core, Spark SQL, Spark MLlib on a standalone cluster -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-6-0-RC4-tp15747p15800.html Sent from the Apache Spark Developers List mailing

recurring test failures against hadoop-2.4 profile

2015-12-25 Thread Ted Yu
Hi, You may have noticed the following test failures: org.apache.spark.sql.hive.execution.HiveUDFSuite.UDFIntegerToString org.apache.spark.sql.hive.execution.SQLQuerySuite.udf_java_method Tracing backwards, they started failing since this build:

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-25 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 29:25 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib (iPython 4.0) 2.0 Spark version is 1.6.0 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-25 Thread Ted Yu
I found that SBT build for Scala 2.11 has been failing ( https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull ) I logged SPARK-12527 and sent a PR. FYI On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust