java.io.NotSerializableException: org.apache.spark.sql.TypedColumn

2018-08-26 Thread zzcclp
Hi dev: I am using Spark-Shell to run the example which is in section 'http://spark.apache.org/docs/2.2.2/sql-programming-guide.html#type-safe-user-defined-aggregate-functions', and there is an error: *Caused by: java.io.NotSerializableException: org.apache.spark.sql.TypedColumn Serialization

Spark 2.1.1 Error:java.lang.NoSuchMethodError: org.apache.spark.network.client.TransportClient.getChannel()Lio/netty/channel/Channel;

2017-07-17 Thread zzcclp
Hi guys: I am using spark 2.1.1 to test on CDH 5.7.1, when i run on yarn with following command, error 'NoSuchMethodError: org.apache.spark.network.client.TransportClient.getChannel()Lio/netty/channel/Channel;' appears sometimes: command: *su cloudera-scm -s "/bin/sh" -c

Spark Maven Test error

2015-03-25 Thread zzcclp
I use command to run Unit test, as follow: ./make-distribution.sh --tgz --skip-java-test -Pscala-2.10 -Phadoop-2.3 -Phive -Phive-thriftserver -Pyarn -Dyarn.version=2.3.0-cdh5.1.2 -Dhadoop.version=2.3.0-cdh5.1.2 mvn -Pscala-2.10 -Phadoop-2.3 -Phive -Phive-thriftserver -Pyarn

Spark SQL udf(ScalaUdf) is very slow

2015-03-23 Thread zzcclp
My test env:1. Spark version is 1.3.02. 3 node per 80G/20C3. read 250G parquet files from hdfs Test case:1. register floor func with command: *sqlContext.udf.register(floor, (ts: Int) = ts - ts % 300), *then run with sql select chan, floor(ts) as tt, sum(size) from qlogbase3 group by chan,